Prerequisites
- Java 6
- Subversion (for repository access)
- Apache Ant (for automated build scripts)
- GIZA++ (for initial system build)
- SRILM (for initial system build)
export GIZA_HOME=/path/to/GIZA++-v2/
export SRILM_HOME=/path/to/srilm/
Install Cunei
The source code for Cunei is hosted in a Subversion repository.
Run the following command to check out the
latest version of of Cunei into the directory cunei.
svn co -r801 https://svn.cunei.org/svnroot/cunei /path/to/cunei
Throughout this tutorial we will run commands and reference files
relative to Cunei's base directory. Please change your working
directory now.
cd /path/to/cunei/
Build a System
The build process is now completely automated. Just type the
command below to generate a example French-English system.
ant -Dcunei.mem=4g -f data/systems/fr-en.default/build.xml system
This will take a long time to complete, but the process should end with a
BUILD SUCCESSFUL message. If instead
you are presented with an error, please email a description of the
problem to Translate
You should have a working system that can produce
translations--let's give it a whirl!
To verify that everything is working, generate the translation
lattice with the command below. This will output the many possible
translations found for each word or phrase in the test sentence.
echo "Je voudrais une bouteille d'eau" | \
bin/cunei.sh Translate data/systems/fr-en.default/config.base
Now try generating full-sentence translations with the command
below. This will output the top four translations of the sentence.
echo "Je voudrais une bouteille d'eau" | \
bin/cunei.sh Decode data/systems/fr-en.default/config.base \
-XDecoder.N-Best 4
Optimize
The default parameters are not adjusted for any particular
language pair and, thus, are quite poor. Now it's time to remedy
that. The following command will optimize Cunei's settings and
significantly improve the quality of translations. A test
document will be translated over and over, each time adjusting
Cunei's parameters in order to produce translations that (as
closely as possible) match a human translated document. This
process will use a lot of memory and may take a few days to
complete.
ant -Dcunei.mem=4g -f data/systems/fr-en.default/build.xml opt
When optimization is complete a new configuration file will be
created for you with the optimized parameters. From now on, use
this new configuration file located at data/systems/fr-en.default/config.base.r801.0.opt
for translation.
Workbench
More information coming soon...
Congratulations
Congratulations on completing the Cunei tutorial. Feedback is
welcome. If something didn't quite work right or you have
suggestions for improvement please send mail to