|
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879808182838485 |
- title: Host your own wikipedia backup
- url: https://dataswamp.org/~solene/2019-11-13-wikimedia-dump.html
- hash_url: 626841dd876f103a10391b1b841ba5ae
-
- <h2 id="wikipediaandopenzim">Wikipedia and openzim</h2>
-
- <p>If you ever wanted to host your own wikipedia replica, here is the simplest
- way.</p>
-
- <p>As wikipedia is REALLY huge, you don’t really want to host a php wikimedia
- software and load the huge database, instead, the project made the <em>openzim</em>
- format to compress the huge database that wikipedia became while allowing using
- it for fast searches.</p>
-
- <p>Sadly, on OpenBSD, we have no software reading zim files and most software
- requires the library openzim to work which requires extra work to get it as a
- package on OpenBSD.</p>
-
- <p>Hopefully, there is a python package implementing all you need as pure python
- to serve zim files over http and it’s easy to install.</p>
-
- <p>This tutorial should work on all others unix like systems but packages or
- binary names may change.</p>
-
- <h2 id="downloadingwikipedia">Downloading wikipedia</h2>
-
- <p>The project Kiwix is responsible for wikipedia files, they create regularly
- files from various projects (including stackexchange, gutenberg, wikibooks
- etc…) but for this tutorial we want wikipedia:
- <a href="https://wiki.kiwix.org/wiki/Content_in_all_languages">https://wiki.kiwix.org/wiki/Content_in_all_languages</a></p>
-
- <p>You will find a lot of files, the language is contained into the filename. Some
- filenames will also self explain if they contain everything or categories, and
- if they have pictures or not.</p>
-
- <p>The full French file is 31.4 GB worth.</p>
-
- <h2 id="runningtheserver">Running the server</h2>
-
- <p>For the next steps, I recommend setting up a new user dedicated to this.</p>
-
- <p>On OpenBSD, we will require python3 and pip:</p>
-
- <pre><code>$ doas pkg_add py3-pip--
- </code></pre>
-
- <p>Then we can use pip to fetch and install dependencies for the zimply software,
- the flag <code>--user</code> is rather important as it allows any user to download and
- install python libraries in its home folder instead of polluting the whole
- system as root.</p>
-
- <pre><code>$ pip3.7 install --user --upgrade zimply
- </code></pre>
-
- <p>I wrote a small script to start the server using the zim file as a parameter, I
- rarely write python so the script may not be high standard.</p>
-
- <p>File <strong>server.py</strong>:</p>
-
- <pre><code>from zimply import ZIMServer
- import sys
- import os.path
-
- if len(sys.argv) == 1:
- print("usage: " + sys.argv[0] + " file")
- exit(1)
-
- if os.path.exists(sys.argv[1]):
- ZIMServer(sys.argv[1])
- else:
- print("Can't find file " + sys.argv[1])
- </code></pre>
-
- <p>And then you can start the server using the command:</p>
-
- <pre><code>$ python3.7 server.py /path/to/wikipedia_fr_all_maxi_2019-08.zim
- </code></pre>
-
- <p>You will be able to access wikipedia on the url http://localhost:9454/</p>
-
- <p>Note that this is not a “wiki” as you can’t see history and edit/create pages.</p>
-
- <p>This kind of backup is used in place like Cuba or Africa areas where people
- don’t have unlimited internet access, the project lead by Kiwix allow more
- people to access knowledge.</p>
|