The naive way:
sents = text.split('.')
avg_len = sum(len(x.split()) for x in sents) / len(sents)
The serious way: use nltk to tokenize the text according to the target language rules.
The naive way:
sents = text.split('.')
avg_len = sum(len(x.split()) for x in sents) / len(sents)
The serious way: use nltk to tokenize the text according to the target language rules.