Other

Mirror Wikipedia on your own computer

Work in progress - Last modification: Nov 28, 2019

With the end of the world coming up, it'll be handy to have a local mirror of wikipedia. The whole database is a bit big to manage, but keeping only the current version of the pages makes it manageable. For example, in October 2019, the English Wikipedia pages (text only, no media) are dumped in 70GB of XML, for about 6 million articles. Using the pages-articles dump, which feature all articles with no history and talk pages, there are in fact 19 million pages to import (with templates, redirects, media descriptions...).

Some software can use these XML dumps and present them in a tailored browser, see the offline wikipedia readers section. It's certainly easier to install, and also come with the pages media for some of them, but it's not as fun as having a real editable wiki. It's also not easy to find a software that works on ARM processor, because it would be nice to have this running on the low power Raspberry Pi 4 computer. It seems kiwix can make a wifi hotspot that offers a static version of wikipedia: see the doc.

It's not really easy to mirror Wikipedia: there's not much recent documentation on this, and having a website similar to what Wikipedia looks like requires using the same version of mediawiki and all its extensions (more than 100). The size of the data makes it hard to complete and it's also complicated to get the media (images and films in pages). Here's a recent update on what works and what doesn't.

  1. Download the XML dumps here: https://dumps.wikimedia.org/backup-index.html.
  2. Install mediawiki from git: https://www.mediawiki.org/wiki/Download_from_Git#Fetch_external_libraries.
  3. Import the XML dumps in your database. The documentation about that (https://www.mediawiki.org/wiki/Manual:Importing_XML_dumps) is quite old, and the only method that seems to be working in 2019 is the one that is not recommended for this task of importing a lot of data, the maintenance/importDump.php script.

0 comment


Discuss this article, add a comment:

name: 
website: 
comment: 
If you are human, type 12: