more README fixes

2024-12-24 09:51:38 +00:00 · 2015-07-17 14:40:33 -04:00 · 2015-07-17 14:40:33 -04:00 · 772c0cddd1
commit 772c0cddd1
parent 0a085132f4
1 changed files with 5 additions and 4 deletions
--- a/wordfreq_builder/README.md
+++ b/wordfreq_builder/README.md
@ -47,8 +47,7 @@ Start the build, and find something else to do for a few hours:
    ninja -v
-You can copy the results into wordfreq with this command (supposing that
+You can copy the results into wordfreq with this command:
 $WORDFREQ points to your wordfreq repo):
    cp data/dist/*.msgpack.gz ../wordfreq/data/
@ -90,9 +89,11 @@ Wikipedia is a "free-access, free-content Internet encyclopedia".
 These files can be downloaded from [wikimedia dump][wikipedia]
 The original files are in `data/raw-input/wikipedia`, and they're processed
-by the `wiki2text` rule in `rules.ninja`.
+by the `wiki2text` rule in `rules.ninja`. Parsing wikipedia requires the
 [wiki2text][] package.
 [wikipedia]: https://dumps.wikimedia.org/backup-index.html
 [wiki2text]: https://github.com/rspeer/wiki2text
 ### Leeds Internet Corpus
@ -113,7 +114,7 @@ by the `convert_leeds` rule in `rules.ninja`.
 The file `data/raw-input/twitter/all-2014.txt` contains about 72 million tweets
 collected by the `ftfy.streamtester` package in 2014.
-It's not possible to distribute the text of tweets. However, this process could
+We are not allowed to distribute the text of tweets. However, this process could
 be reproduced by running `ftfy.streamtester`, part of the [ftfy][] package, for
 a couple of weeks.