No future - Last modification: Nov 30, 2020
With the end of the world coming up, it'll be handy to have a local mirror of wikipedia. The whole database is a bit big to manage, but keeping only the current version of the pages from a snapshot makes it manageable. For example, in October 2019, the English Wikipedia pages (text only, no media) are dumped in 70GB of XML, for about 6 million articles. Using the pages-articles dump, which feature all articles with no history and talk pages, there are in fact more than 19.6 million pages to import (with templates, redirects, media descriptions...). After 7 months of importing on a raspberry pi 4, the database weights 290GB on the disk, without caching, and the pages are at least 7 months old and cannot be updated automatically.
Some software can use these XML dumps and present them in a tailored browser, see the offline wikipedia readers section. It's certainly easier to install, and also come with the pages media for some of them, but it's not as fun as having a real editable wiki. It's also not easy to find a software that works on ARM processor, because it would be nice to have this running on the low power Raspberry Pi 4 computer. It seems kiwix can make a wifi hotspot that offers a static version of wikipedia: see the doc.
It's not really easy to mirror Wikipedia: there's not much recent documentation on this, and having a website similar to what Wikipedia looks like requires using the same version of mediawiki and all its extensions (more than 100). The size of the data makes it hard to complete and it's also complicated to get the media (images and films in pages). Here's a recent update on what works and what doesn't.
I should add that my cheap SSD died a few months after the import completed, so I never could finish putting it online and lost the 7 months of import because I couldn't copy the 290GB elsewhere. Also, having a 7 months old version and counting of wikipedia is not as fun as the idea sounded at the beginning, and there is no incremental update system, or it would be slower than wikipedia change rate anyway. If only the import methods that work much faster were still available, a bimonthly import could be done, but not with this bad XML importer.
Discuss this article, add a comment: