There are Unicode normalization problems with Malayalam -- as best I understand
it, Unicode simply neglected to include normalization forms for Malayalam "chillu"
characters even though they changed how they're represented in Unicode 5.1 and
again in Unicode 9.
The result is that words that print the same end up with multiple entries, with
different codepoint sequences that don't normalize to each other.
I certainly don't know how to resolve this, and it would need to be resolved to
have something that we could reasonably call Malayalam word frequencies.