From f39308625389fa1bcb7c8f3a96e7d8e455c6fbd7 Mon Sep 17 00:00:00 2001 From: Andrew Lin Date: Fri, 31 Jul 2015 19:12:59 -0400 Subject: [PATCH] Remove redundant reference to wikipedia in builder README. Former-commit-id: 53621c34dfc27a77bf28ecdd88c450585268a3fa --- wordfreq_builder/README.md | 13 ------------- 1 file changed, 13 deletions(-) diff --git a/wordfreq_builder/README.md b/wordfreq_builder/README.md index a17c504..2aedf27 100644 --- a/wordfreq_builder/README.md +++ b/wordfreq_builder/README.md @@ -82,19 +82,6 @@ The specific rules are described by the comments in `rules.ninja`. ## Data sources -### Wikipedia - -Wikipedia is a "free-access, free-content Internet encyclopedia". - -These files can be downloaded from [wikimedia dump][wikipedia] - -The original files are in `data/raw-input/wikipedia`, and they're processed -by the `wiki2text` rule in `rules.ninja`. Parsing wikipedia requires the -[wiki2text][] package. - -[wikipedia]: https://dumps.wikimedia.org/backup-index.html -[wiki2text]: https://github.com/rspeer/wiki2text - ### Leeds Internet Corpus Also known as the "Web as Corpus" project, this is a University of Leeds