SS
А вот spacy/slovnet могут и без gpu довольно быстро работать
Size: a a a
SS
SS
V
RB
V
KK
IR
DR
DR
DR
DR
IZ
D
D
# import punkt
import nltk.tokenize.punkt
# Make a new Tokenizer
tokenizer = nltk.tokenize.punkt.PunktSentenceTokenizer()
# Read in training corpus (one example: Slovene)
import codecs
text = codecs.open("slovene.plain","Ur","iso-8859-2").read()
# Train tokenizer
tokenizer.train(text)
# Dump pickled tokenizer
import pickle
out = open("slovene.pickle","wb")
pickle.dump(tokenizer, out)
out.close()
D
D
DK
D
D
DK