1.安装环境:
Ubuntu16.04
2.最低软件需求:
Moses
GIZA++, 生成平行语料的词对齐
IRSTLM, SRILM, 或者KenLM, 生成语言模型估计
3.安装依赖
sudo apt-get install build-essential git-core pkg-config automake libtool wget zlib1g-dev python-dev libbz2-dev
4.从https://sourceforge.net/projects/irstlm下载IRSTLM
unzip irstlm-5.80.08.zip
cd irstlm-5.80.08
./regenerate-makefiles.sh
./configure --prefix=/path/where/to/install/irstlm
make
make install
5.安装GIZA++
git clone https://github.com/moses-smt/giza-pp.git
cd giza-pp
make
上述命令可以得到二进制文件: ~/giza-pp/GIZA++-v2/GIZA++, ~/giza-pp/GIZA++-v2/snt2cooc.out和~/giza-pp/mkcls-v2/mkcls. 这里需要将这些文件拷贝到Moses可以找到它们的地方, 如下
cd ~/mosesdecoder
mkdir tools
cp ~/giza-pp/GIZA++-v2/GIZA++ ~/giza-pp/GIZA++-v2/snt2cooc.out ~/giza-pp/mkcls-v2/mkcls tools
当你训练模型时,你需要用-external-bin-dir参数告诉训练脚本GIZA++安装在哪
train-model.perl -external-bin-dir $HOME/mosesdecoder/tools
6.拷贝和编译Moses
拷贝
git clone https://github.com/moses-smt/mosesdecoder.git
cd mosesdecoder
编译
./bjam --with-irstlm=/path/where/to/install/irstlm -j4