correct the simple_tokenize docstring

2024-12-23 17:31:41 +00:00 · 2015-08-26 13:54:50 -04:00 · 2015-08-26 13:54:50 -04:00 · f7babea352
commit f7babea352
parent 01b6403ef4
1 changed files with 2 additions and 3 deletions
--- a/wordfreq/tokens.py
+++ b/wordfreq/tokens.py
@ -55,9 +55,8 @@ def simple_tokenize(text):
      ideograms and hiragana) relatively untokenized, instead of splitting each
      character into its own token.

-    - It excludes punctuation, many classes of symbols, and "extenders" with
-      nothing to extend, from being tokens, but it allows miscellaneous symbols
-      such as emoji.
+    - It outputs only the tokens that start with a word-like character, or
+      miscellaneous symbols such as emoji.

    - It breaks on all spaces, even the "non-breaking" ones.
    """