A multi-domain Chinese word segmentation toolkit. link
The pkuseg-python toolkit has the following features:
Supporting multi-domain Chinese word segmentation. Pkuseg-python supports multi-domain segmentation, including domains like news, web, medicine, and tourism. Users are free to choose different pre-trained models according to the domain features of the text to be segmented. If not sure the domain of the text, users are recommended to use the default model trained on mixed-domain data.
Higher word segmentation results. Compared with existing word segmentation toolkits, pkuseg-python can achieve higher F1 scores on the same dataset.
Supporting model training. Pkuseg-python also supports users to train a new segmentation model with their own data.
Supporting POS tagging. We also provide users POS tagging interfaces for further lexical analysis.
Ruixuan Luo, Jingjing Xu, Xuancheng Ren, Yi Zhang, Bingzhen Wei，Xu Sun