Here comes one of those “I’ve got to write that down somewhere, and maybe it will be useful for someone else, too” posts:
I needed to get a list of MediaWiki page names of pages that contained a certain string (“needle”) from a MediaWiki XML dump. This is how I got it, using XMLStarlet:
xml sel -N mw=http://www.mediawiki.org/xml/export-0.3/ \ -t \ -m "/mw:mediawiki/mw:page/mw:revision/mw:text[contains(string(.), 'needle')]" \ -n \ -v "../../mw:title" wikiexport.xml \ | xml unesc