more README fixes

2024-12-24 01:41:39 +00:00 · 2015-07-17 14:40:33 -04:00 · 2015-07-17 14:40:33 -04:00 · 772c0cddd1
commit 772c0cddd1
parent 0a085132f4
1 changed files with 5 additions and 4 deletions
--- a/wordfreq_builder/README.md
+++ b/wordfreq_builder/README.md
@ -47,8 +47,7 @@ Start the build, and find something else to do for a few hours:

    ninja -v

-You can copy the results into wordfreq with this command (supposing that
-$WORDFREQ points to your wordfreq repo):
+You can copy the results into wordfreq with this command:

    cp data/dist/*.msgpack.gz ../wordfreq/data/

@ -90,9 +89,11 @@ Wikipedia is a "free-access, free-content Internet encyclopedia".
 These files can be downloaded from [wikimedia dump][wikipedia]

 The original files are in `data/raw-input/wikipedia`, and they're processed
-by the `wiki2text` rule in `rules.ninja`.
+by the `wiki2text` rule in `rules.ninja`. Parsing wikipedia requires the
+[wiki2text][] package.

 [wikipedia]: https://dumps.wikimedia.org/backup-index.html
+[wiki2text]: https://github.com/rspeer/wiki2text

 ### Leeds Internet Corpus

@ -113,7 +114,7 @@ by the `convert_leeds` rule in `rules.ninja`.
 The file `data/raw-input/twitter/all-2014.txt` contains about 72 million tweets
 collected by the `ftfy.streamtester` package in 2014.

-It's not possible to distribute the text of tweets. However, this process could
+We are not allowed to distribute the text of tweets. However, this process could
 be reproduced by running `ftfy.streamtester`, part of the [ftfy][] package, for
 a couple of weeks.