红联Linux门户
Linux帮助

Ubuntu Coreseek安装配置

发布时间:2015-03-02 15:06:46来源:linux网站作者:yerunping

一切按照官方的说明文档来安装,但到最好配置时却老配置老出错。最终只能再搜索一下,按下面的配置才算搞定。刚玩coreseek,对一些参数还不是很熟悉,但又想配置起来玩下,没有好好地看官方说明文档呵呵。

防止出现编译错误,先安装以下程序
yum -y install mysql mysql-devel php-mysql qt4-mysql python python-dev gcc-c++ gtk+ libtool automake autoconf glibc-common expat-devel


1、安装
wget http://www.coreseek.cn/uploads/csft/3.1/Source/csft-3.1.tar.gz ####coreseek源文件
wget http://www.coreseek.cn/uploads/csft/3.1/Source/mmseg-3.1.tar.gz #####coreseek所使用的词典
tar zxvf csft-3.1.tar.gz
tar zxvf mmseg-3.1.tar.gz

#####在安装coreseek前必须先安装mmseg
cd mmseg-3.1
./configure –prefix=/usr/local/mmseg
make
make install

######## 安装coreseek ########
##这里不使用python数据源,若需要,请加上 –with-python,在mmseg上一定要对应路径
./configure –prefix=/usr/local/coreseek –with-mmseg-includes=/usr/local/mmseg/include/mmseg –with-mmseg-libs=/usr/local/mmseg/lib –without-iconv

指定–enable-id64选项会打开64位文档ID和词ID的支

make
make install

若无问题,安装完毕后在/usr/local/下生成 coreseek目录及其下文件。

接下来要生成 mmseg词库及配置文件:
cd /usr/loca/mmseg
/usr/local/mmseg/bin/mmseg -u /usr/local/src/mmseg-3.1/data/unigram.txt   ###unigram.txt是对应的词典文件,将会生成unigram.txt.uni
cd ../coreseek
mkdir dict ###创建字典目录
cp /usr/local/src/mmseg-3.1/data/unigram.txt.uni dict/uni.lib    ###把创建的词典复制到dict
vim dict/mmseg.ini ####创建mmseg的配置文件,此文件在coreseek的windows版本已自带!

mmseg.ini:
[mmseg]
merge_number_and_ascii=1;
number_and_ascii_joint=-;
compress_space=0;
seperate_number_ascii=1;
至此,mmseg配置完毕!下一步配置csft.conf——coreseek的配置文件

source article
{
type                                    = mysql
sql_host                                = localhost
sql_user                                = root
sql_pass                                = jiaxian
sql_db                                  = test
sql_port                                = 3306 # optional, default is 3306

sql_query_pre                           = SET NAMES utf8
#sql_query_pre                           = SET SESSION query_cache_type=OFF ##这个可以关闭sql查询缓存
#sql_query = SELECT id, classid, checked, title, newstime, newstext FROM article
sql_query_range = SELECT MIN(id),MAX(id) FROM article
sql_range_step = 1000
sql_query = SELECT id, classid, checked, title, newstime, newstext FROM article WHERE id>=$start AND id<=$end

sql_attr_uint = classid
sql_attr_uint = checked
sql_attr_uint = newstime
sql_query_info = select * from article where id=$id

}

index article
{
source                                  = article
path                                    = /usr/local/coreseek/var/data/article
docinfo                                 = extern
charset_type                         = zh_cn.utf-8 ###指定coreseek的编码
charset_dictpath                    = /usr/local/coreseek/dict #####coreseek字典文件

min_prefix_len                        = 0
min_infix_len                          = 0
min_word_len                         = 2
ngram_len               = 1
ngram_chars = U+4E00..U+9FBF, U+3400..U+4DBF, U+20000..U+2A6DF, U+F900..U+FAFF,\
U+2F800..U+2FA1F, U+2E80..U+2EFF, U+2F00..U+2FDF, U+3100..U+312F, U+31A0..U+31BF,\
U+3040..U+309F, U+30A0..U+30FF, U+31F0..U+31FF, U+AC00..U+D7AF, U+1100..U+11FF,\
U+3130..U+318F, U+A000..U+A48F, U+A490..U+A4CF
html_strip              = 0
}

indexer
{
mem_limit   = 256M
}
searchd
{
# address    = 0.0.0.0
log     = /usr/local/coreseek/var/log/searchd.log
query_log   = /usr/local/coreseek/var/log/query.log
read_timeout = 5
max_children = 30
pid_file   = /usr/local/coreseek/var/log/searchd.pid
max_matches   = 1000
seamless_rotate = 1
}

表的结构 `article`

DROP TABLE IF EXISTS `article`;
CREATE TABLE IF NOT EXISTS `article` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`classid` smallint(6) NOT NULL DEFAULT ’0′,
`checked` tinyint(1) NOT NULL DEFAULT ’0′,
`title` varchar(200) NOT NULL DEFAULT ”,
`newstime` int(10) NOT NULL DEFAULT ’0′,
`newstext` mediumtext NOT NULL,
PRIMARY KEY (`id`),
KEY `checked` (`checked`),
KEY `newstime` (`newstime`),
KEY `classid` (`classid`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;

INSERT INTO `article` (`id`, `classid`, `checked`, `title`, `newstime`, `newstext`) SELECT `id`, `classid`, `checked`, `title`, `newstime`, `newstext` FROM `test` where id < 1000

建立索引:
/usr/local/coreseek/bin/indexer –config /usr/local/coreseek/dict/csft.conf –all –rotate

使用CLI端测试一下:
/usr/local/coreseek/bin/search -c /usr/local/coreseek/dict/cnal.conf -i url_quick 铝

启动Sphinx守护进程(searchd)
/usr/local/coreseek/bin/searchd -c /usr/local/coreseek/dict/csft.conf
/usr/local/coreseek/bin/searchd –stop -c /usr/local/coreseek/dict/csft.conf


2、出错
CentOS 编译sphinx时老出现xmlUnknownEncoding 错误
libsphinx.a(sphinx.o): In function `xmlUnknownEncoding’:
/var/nfs_root/csft-3.1/src/sphinx.cpp:19072: undefined reference to `libiconv_open’
/var/nfs_root/csft-3.1/src/sphinx.cpp:19090: undefined reference to `libiconv’
/var/nfs_root/csft-3.1/src/sphinx.cpp:19096: undefined reference to `libiconv_close’
libsphinx.a(tokenizer_zhcn.o): In function `CSphTokenizer_zh_CN_GBK::GetLocalBuffer(unsigned char*, int, unsigned char*)’:
/var/nfs_root/csft-3.1/src/tokenizer_zhcn.cpp:327: undefined reference to `libiconv’
libsphinx.a(tokenizer_zhcn.o): In function `CSphTokenizer_zh_CN_UTF8_Private::GetConverterOutput(char const*, char const*)’:
/var/nfs_root/csft-3.1/src/tokenizer_zhcn.cpp:79: undefined reference to `libiconv_open’
/var/nfs_root/csft-3.1/src/tokenizer_zhcn.cpp:82: undefined reference to `libiconv’
libsphinx.a(tokenizer_zhcn.o): In function `CSphTokenizer_zh_CN_GBK::SetBuffer(unsigned char*, int)’:
/var/nfs_root/csft-3.1/src/tokenizer_zhcn.cpp:355: undefined reference to `libiconv’
libsphinx.a(tokenizer_zhcn.o): In function `CSphTokenizer_zh_CN_UTF8_Private::GetConverter(char const*, char const*)’:
/var/nfs_root/csft-3.1/src/tokenizer_zhcn.cpp:63: undefined reference to `libiconv_open’
/var/nfs_root/csft-3.1/src/tokenizer_zhcn.cpp:66: undefined reference to `libiconv’
libsphinx.a(tokenizer_zhcn.o): In function `~CSphTokenizer_zh_CN_UTF8_Private’:
/var/nfs_root/csft-3.1/src/tokenizer_zhcn.cpp:36: undefined reference to `libiconv_close’
/var/nfs_root/csft-3.1/src/tokenizer_zhcn.cpp:38: undefined reference to `libiconv_close’
/var/nfs_root/csft-3.1/src/tokenizer_zhcn.cpp:36: undefined reference to `libiconv_close’
/var/nfs_root/csft-3.1/src/tokenizer_zhcn.cpp:38: undefined reference to `libiconv_close’
/var/nfs_root/csft-3.1/src/tokenizer_zhcn.cpp:36: undefined reference to `libiconv_close’
libsphinx.a(tokenizer_zhcn.o):/var/nfs_root/csft-3.1/src/tokenizer_zhcn.cpp:38: more undefined references to `libiconv_close’ follow
collect2: ld returned 1 exit status
make[2]: *** [indexer] Error 1
make[2]: Leaving directory `/var/nfs_root/csft-3.1/src’
make[1]: *** [all] Error 2
make[1]: Leaving directory `/var/nfs_root/csft-3.1/src’
make: *** [all-recursive] Error 1
处理结果:
Add ‘-liconv’ to LIBS in src/Makefile
from
LIBS = -lm -lexpat -L/usr/local/lib
to
LIBS = -lm -lexpat -liconv -L/usr/local/lib