Debian/Ubuntu package: GIZA++ 1.0.1

There is also Ubuntu packages of mkcls and GIZA++ at http://cl.aist-nara.ac.jp/~eric-n/ubuntu-nlp/dists/dapper/nlp/. However I want to use giza-pp (including, mkcls) 1.0.1 from http://code.google.com/p/giza-pp/. Thus I pack it. The results are as follow:

gizapp_1.0.1-1ubuntu5.diff.gz
gizapp_1.0.1-1ubuntu5.dsc
gizapp_1.0.1-1ubuntu5_i386.build
gizapp_1.0.1-1ubuntu5_i386.changes
gizapp_1.0.1.orig.tar.gz
giza++-static_1.0.1-1ubuntu5_i386.deb
mkcls_1.0.1-1ubuntu5_i386.deb

Changes note: There are some changes in command line interface as follow:

GIZA++ changed to giza++.
snt2plain.out changed to snt2plain.
plain2snt.out changed to plain2snt.
snt2cooc.out changed to snt2cooc.
trainGIZA++ changed to train-giza++.

Lintian reported many warnings but I still don’t know how to fix them😛.

Update: To pass Lintian tests, man pages are needed.

Usage example

Given there 2 parallel plain  text files in English and Thai.

eng.txt:

a dog eat a chicken
a chichken eat a fish

tha.txt:

หมา กิน ไก่
ไก่ กิน ปลา

In order to align these text, we use this script as follow:

$ plain2snt eng.txt tha.txt
w1:eng w2:tha
eng -> eng
tha -> tha

$ train-giza++ eng.vcb tha.vcb eng_tha.snt
END.

Then the result, will be in GIZA++.A3.final :

$ cat GIZA++.A3.final
# Sentence pair (1) source length 5 target length 3 alignment score : 0.0373314
หมา กิน ไก่
NULL ({ }) a ({ }) dog ({ }) eat ({ }) a ({ 2 }) chicken ({ 1 3 })
# Sentence pair (2) source length 5 target length 3 alignment score : 0.0373315
ไก่ กิน ปลา
NULL ({ }) a ({ }) chichken ({ }) eat ({ }) a ({ 2 }) fish ({ 1 3 })

P.S. I built these packages on Ubuntu 7.10

ใส่ความเห็น

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / เปลี่ยนแปลง )

Twitter picture

You are commenting using your Twitter account. Log Out / เปลี่ยนแปลง )

Facebook photo

You are commenting using your Facebook account. Log Out / เปลี่ยนแปลง )

Google+ photo

You are commenting using your Google+ account. Log Out / เปลี่ยนแปลง )

Connecting to %s