Debian/Ubuntu package: GIZA++ 1.0.1

There is also Ubuntu packages of mkcls and GIZA++ at However I want to use giza-pp (including, mkcls) 1.0.1 from Thus I pack it. The results are as follow:


Changes note: There are some changes in command line interface as follow:

GIZA++ changed to giza++.
snt2plain.out changed to snt2plain.
plain2snt.out changed to plain2snt.
snt2cooc.out changed to snt2cooc.
trainGIZA++ changed to train-giza++.

Lintian reported many warnings but I still don’t know how to fix them :-P.

Update: To pass Lintian tests, man pages are needed.

Usage example

Given there 2 parallel plain  text files in English and Thai.


a dog eat a chicken
a chichken eat a fish


หมา กิน ไก่
ไก่ กิน ปลา

In order to align these text, we use this script as follow:

$ plain2snt eng.txt tha.txt
w1:eng w2:tha
eng -> eng
tha -> tha

$ train-giza++ eng.vcb tha.vcb eng_tha.snt

Then the result, will be in :

$ cat
# Sentence pair (1) source length 5 target length 3 alignment score : 0.0373314
หมา กิน ไก่
NULL ({ }) a ({ }) dog ({ }) eat ({ }) a ({ 2 }) chicken ({ 1 3 })
# Sentence pair (2) source length 5 target length 3 alignment score : 0.0373315
ไก่ กิน ปลา
NULL ({ }) a ({ }) chichken ({ }) eat ({ }) a ({ 2 }) fish ({ 1 3 })

P.S. I built these packages on Ubuntu 7.10



