For a while now I’ve been wanting to consolidate all of my blog entries into a blog system that runs in my own web space. That way, I can customize the blog as much as I want, I can be confident that I’ll never have to pay extra for it (beyond what I already pay for web hosting), and I can give it a nice url like robinstewart.com/blog/.
The only problem was that the process of transferring my old blog entries into the new blog system turned out to be waaaay more time-consuming than I had hoped. That is the moral of this story. But if you’re interested in all the nerdy details, read on.
I started writing occasional blog entries back in 2006. I originally used a blog provided through Williams College. Then at some point I switched to a blog hosted by Blogger (but I left the old entries in the old blog). Now that it has become easy to install WordPress on my own website, I did so. But to really make the transition, I had to move all of the blog entries from the two old systems into the new system. How hard could that be?
Well, the Williams blog was hosted in an extremely old, “multi-user” WordPress installation. I made an attempt to upgrade that system to the latest version so that it would gain the “export” feature. But after reading a lot of documentation, I decided that the process would be long, tedious, and fraught with peril (both because it is a “multi-user” version and would have to be upgraded through a series of new releases, one by one). Instead, I ended up writing a PHP script that pulled my blog post information (title, text, date, author, etc) from the underlying MySQL database and exported it in the XML format that newer versions of WordPress can import. After a few iterations of this script, I was able to successfully import the resulting file into my new WordPress system. (Thankfully, the date/time format has not changed since the old WordPress version.)
The next step was to import my newer blog entries from Blogger. The new WordPress has an option to do this import directly. You provide your login information and it goes and automatically fetches all of the blog entries. That import went smoothly, but the HTML underlying the blog entries I had created in Blogger were full of extra <div>s so that the entries didn’t render properly alongside normal, clean WordPress-generated blog entries. I considered doing some CSS hacking to make the Blogger entries look ok, but after some experimenting to no avail I decided it would be a lot better to have clean HTML anyway.
To achieve that, I ended up exporting from my new WordPress system all of the blog entries that I had by now imported. I opened the resulting XML export file in a text editor, and performed some judicious find-and-replace-all operations to get rid of those extraneous <div> tags (while keeping the important ones). Then, I deleted all of the entries from WordPress and re-imported my edited XML file. Unfortunately, this re-import didn’t quite work (the importer web page just hung indefinitely). But I was eventually able to work around the problem by splitting up the XML file into about five different files, and importing them separately.
Finally, all of my old blog posts were in my new system, and in a way that looked fine without CSS hacks. Whew! It may have been faster to retype them all by hand.