Program or be Programmed

After reading Jürgen’s blog post “The computer as an appliance” on Friday, I followed the link he placed to Douglas Rushkoff‘s latest book, “Program or be Programmed“. The great title alone sparked my interest, and the description on the publisher’s page got me to immediately order an e-book copy. A few seconds later I started reading it (it comes in multiple formats – I chose to read the EPUB version with Aldiko on my Android phone – it’s a great way to read a book!)

So, just finished it, and I really liked it. It got me thinking a lot, and examining my own life in the Digital World. It also gave me a new perspective on anonymity on the net.

Douglas, thanks for your great work! 🙂

How to extract a list of pages containing a string from a MediaWiki XML dump

Here comes one of those “I’ve got to write that down somewhere, and maybe it will be useful for someone else, too” posts:

I needed to get a list of MediaWiki page names of pages that contained a certain string (“needle”) from a MediaWiki XML dump. This is how I got it, using XMLStarlet:

xml sel -N mw=http://www.mediawiki.org/xml/export-0.3/ \
 -t \
  -m "/mw:mediawiki/mw:page/mw:revision/mw:text[contains(string(.), 'needle')]" \
  -n \
  -v "../../mw:title" wikiexport.xml \
| xml unesc

Creating multi-page PDF files with GIMP and `convert`

Occasionally I have to sign some document (old style, with a pen) and send it electronically. Sometimes those are multi-page documents. Since it is uncommon to send it back as multiple image files after scanning, and multi-page image formats are uncommon as well, I’d like to send them as PDF file. Before I discovered this method, I used to insert the scanned images into OpenOffice Writer, and then create the PDF with it. This works, but it is a bit cumbersome to tell OpenOffice Writer to maximise the images (eliminating page borders, etc.), especially when there are a lot of pages. It just doesn’t feel like a real solution.

So, here we go:

Prerequisites:

  • GIMP (I’m currently at version 2.6.8, but this will probably work with older versions as well)
  • GraphicsMagick (tested with 1.3.8) or ImageMagick (tested with 6.5.8.8)

Procedure:

  1. Get the scanned pages opened as layers of one image in GIMP. If they are available as files already, you can use File / Open as Layers….
  2. Make sure that the layers are ordered in the following way: Page 1 must be the bottom layer, the last page must be the top layer. You can reorder them via the “Layers” dialogue (activate it via the Windows / Dockable Dialogues menu if you don’t see it)
  3. Save As… and choose “MNG animation” or just add “.mng” to the filename. (In case you are wondering, MNG is the animated counterpart to PNG).
    A dialogue window saying “MNG plug-in can only handle layers as animation frames” will come up – choose “Save as Animation” here and press the Export button. In the next dialogue you don’t need to make any changes to the defaults, just press the Save button.
  4. Now, open a console window and simply enter
    convert document.mng document.pdf

That’s it – you now have your PDF file ready for sending!

Update (2010-02-08):
As chithanh pointed out in comment 1, there is another convenient way to accomplish the same. It does not involve GIMP, but instead requires pdftk to concatenate PDF files. Please see comment 2 for details.

Update (2010-03-01):
And yet another way (definitely the most straight-forward one, if you have the pages as single image files already) was pointed out by goffrie in comment 5.