Archive for the 'General' Category

How to extract a list of pages containing a string from a MediaWiki XML dump

Here comes one of those “I’ve got to write that down somewhere, and maybe it will be useful for someone else, too” posts:

I needed to get a list of MediaWiki page names of pages that contained a certain string (“needle”) from a MediaWiki XML dump. This is how I got it, using XMLStarlet:

xml sel -N mw=http://www.mediawiki.org/xml/export-0.3/ \
 -t \
  -m "/mw:mediawiki/mw:page/mw:revision/mw:text[contains(string(.), 'needle')]" \
  -n \
  -v "../../mw:title" wikiexport.xml \
| xml unesc

Internet-Manifest

“Wie Journalismus heute funktioniert. 17 Behauptungen.”

Gut geschrieben und auf den Punkt gebracht. Ich werde hiermit sicher nicht das letzte Mal darauf verlinkt haben.

Total Solar Eclipse on 2009-07-22, Shanghai area

There will be a total solar eclipse on 2009-07-22, and a good place to observe it is my current place of residence, Shanghai.

Here is a map from NASA’s website, I added the local time for Shanghai residents:

Total_Solar_Eclipse_July_22_Shanghai

Hopefully it won’t be cloudy! :)