Live to learn and you will learn to live. Portuguese proverb

Tags: pc+bible

Grabbing bible text from the net

by Christoph Email

It's amazing how many good Christian resources you can find on the web these days. One site that I've been looking into for quite a while is bibleserver.com, a cooperation of different bible societies, which allows you to browse a number of different modern translations online. Now, most of these translations, I already have in my BibleWorks, where they can be used a lot more comfortably. The most important ones, I even have on my palm, which I always carry in my pocket. But then again, some I don't have and I didn't even know they existed in a digital form anywhere. Now, short of considerable fiddling around with tools like cUrl and others, it is simply not possible to download any of those texts from bibleserver.com, since it uses cookies and sessions and redirects the user to a result page after each request. So, with bibleserver.com, you're stuck with viewing bible texts online or typing them into your palm by hand. Unless you start observing, and you find a shortcut around all the redirects and cookies: bibleserver.com uses the following procedure to retrieve a particular text:
  1. The user calls http://www.bibleserver.com/act.php with a number of parameters:
    • textref=bcccvvv, where bcccvvv is a number composed of the following parts: b, a one- or two-digit number identifying the book, with Genesis being 1 and Revelation being 66; ccc a three-digit number (flushed with zeroes) identifying the chapter, and vvv a three-digit number (flushed with zeroes) identifying the verse. So 1001001 would be Genesis 1:1, 40028020 would be Matthew 28:20, etc. Note that vvv can be 000, which will bring you to the beginning of the chapter, too.
    • translation, which is an unsigned integer identifying the translation to be used. In order to find out which number corresponds to which translation, just have a look at the source code of this page.
  2. bibleserver.com then redirects the user back to index.php with a session id attached. Note that this result page does not take any query parameters directly, so you can't construct a usable link to it and go grab the contents.
Now, knowing all of this doesn't help you a whole lot until you take a look at some of the member bible societies' sites. You'll quickly find out that any bible text displayed on them is pulled directly from the bibleserver.com database. In this way, looking at the International Bible Societies' site, I found that there's a nifty XML api for bibleserver at http://www.bibleserver.com/xml.php, but unfortunately, it's limited to a number of known IP addresses and therefore not usable for the general public. Next, I took a look at the German Bible Society's site, where you can find the 1984 Luther translation, as well as the German "Gute Nachricht" translation online. However, what was really interesting was the format of the urls on these pages. For example, the link to the Luther translation of Matthew 28:20 would be http://www.dbg.de/channel.php?channel=35&SELECT=lut&INPUTREF=40028020. Take a look at the parameters:
  • channel=35 is something that's used on about every page of this site and therefore not particularly interesting.
  • INPUTREF=40028020 looks familiar, doesn't it? The number is the exact same that is used on bibleserver.com
  • SELECT=lut seems logical too, since "lut" is quite clearly an abbreviation for the Luther translation.
Now I got really curious: A quick look at bibleserver.com's translation overview showed me that, among others, they have the NIV online. So I replaced SELECT=lut with SELECT=niv, and, voil├? , this is what I saw. Nice and clean, no redirects, no cookies, nothing. Now the rest is up to you. I wrote a quick and dirty PHP script to do my pulling. But I think you see how it works. You know how to construct the URLs and once you find out that each individual verse is nicely wrapped in a separate (there are no other tags on the page), there's not a whole lot left to do -- except read the bible you just pulled. more »