PhotoDude.com

The Daily Whim

The Daily Whim

All The News That Fits My Whim

Sat. Sep 13, 2003

Changes Under The Hood

Changes Under The Hood – This weekend, I am in the process of either [1] substantially restructuring this 7 year old site in order to move it to a new web host, OR [2] completely hosing years worth of work by misusing powerful software.

So if you are experiencing any strange redirects to nowhere, missing pages, or other oddities at this site, it’s OK. I’m doing it on purpose. Or rather, it is an unavoidable consequence of the structural contortions I must perform simply to prepare this site to be “moved.” Perversely, if I am successful, you will hardly be able to notice anything has changed … visibly.

If you’re not a web geek, that’s all you need to know; things may look strange, but it’s not your computer, and all will soon return to normal here without you having to do a thing. If you are a web geek, of course, you might be curious why I’m structurally screwing with a site just before I move it.

I blame it on Movable Type.

As always, it’s the tool that gets the blame, not the operator. But I’m also not the first person to face this dilemma. In fact, you’ll find a clear tutorial here on exactly how to fix this problem should you encounter it, but here’s the gist of it.

MT gives entries sequential ID numbers, which many of us started using within our archive structure. For example, your second entry in your weblog would end up with a filename of 000002.html. MT will happily pump out new entries in sequential order.

The problem comes when you add a second weblog served by the same installation of MT. Suppose you’ve made 52 entries in your first weblog, and then you add a photoblog. Your first entry in your photoblog won’t be named 000001.html, it will be 000053.html. The new weblog becomes a part of the already ongoing number sequence. If you end up with 198 entries in your first weblog, and 42 in your second weblog, the numeric ID’s between 000052 and 000250 will be a jumbled mix of the two weblogs.

So what’s the big deal? Well, when you move to a new host and set up a new install of MT, you’ll have one large exported file for each of your two weblogs, the backups output by your install of MT at your old web host. Those files must be imported one weblog at a time, and therefore the numeric ID’s will be entirely different than they were on the other server, as they won’t be “created” in the original sequence.

This will cause every link anyone has ever given your work to break, and for those of us who “subreference” one entry we’ve written within another new one, those links will break as well. What a %$@?! mess.

I know there’s a lot of site owners out there who simply don’t care if they create such link rot, in fact, almost brag about it; “All the archives are gone, deleted, as I’m doing something new now.” To me, frankly, that’s rude! You know how bloggers are. We covet links from others. We hope our witty insights inspire others to point to us. And then when people do go to the trouble to link your words via the specific “permalink” you’ve provided, you brusquely convert them to “deadlinks.” It’s not only rude, it degrades the entire concept of “linking” on which this wonder is all built, and is counterproductive to your own web wellness.

Rant over. Suffice it to say, I hate link rot, yet was faced with a situation rife with opportunity to generate it. Finding that others have faced the same thing, and with a wealth of somewhat competing advice on just how I should best structure my archives and their URL’s (“Really, enough, guys, I get the point”), as usual, I came up with my own unique way of doing things.

The problem at many MT sites is the “default” archive structure can give you URL’s like this:
../archives/003137.shtml

As you might expect, when I set up this archive structure in February of 2002, I started out with my own hybrid that is now biting me in the ass:
../dumped/2003/September/003137.shtml
I managed to include the date, but with the month capitalized, and the poison pill of the entry’s numeric ID.

Now, a real purist would say I need to establish a structure that would generate this kind of URL:
../2003/september/13/changes_under_the_hood.shtml
...or even a “slugged” URL stripped of file type:
../2003/september/13/changes
...while others suggest the only URL scheme that will never change is one solely based on the date and time of the entry (the argument being if you ever change the entry title, it will break the URL).

I’m not a purist (well, I’m a selective purist). I don’t change entry titles. And I don’t want to generate 365 directories for every year this weblog has been around (there would be just shy of a thousand for the past 2.5 years). So I decided on this:
../2003/september/13_changes_under_the_hood.shtml
Only 12 directories generated per year, a human readable URL, and if the web world someday switches from *.shtml to *.futurefile, I’ll rely on the fact some smart soul will offer a server side solution. But he/she is likely in 6th grade right now, so I’m not going to worry about it too much today.

Now that we’ve solved that, how do we get from
../dumped/2003/September/003137.shtml
...to…
..2003/september/13_changes_under_the_hood.shtml
...and do it from one server to another, without busting every link we’ve ever gotten? First, we note one of the reasons we’re switching web hosts … a lack of web space. Reading through the MT forums has revealed that MT really really doesn’t like it when you run out of web space. It gets angrily constipated and/or starts spewing out crap you never requested. My plan involved duplicating nearly 2000 HTML documents in a new directory structure, and I was using nearly 290 MB of my allotted 300 MB (I’ve had a site for seven years, get off my back).

So step one was a crash diet, in which I temporarily stripped everything I could in the “low-traffic, large-file-size” category. Despite being relatively ruthless (as much as I could bring myself to be), I carved a mere 20 MB from the site. My rough but conservative estimates (1900 pages at 20-25 Kb each equals 35 to 45 MB) indicated the free 30 MB might not be enough.

So I decided I’d have to institute site wide link rot … for ten to fifteen minutes. In a manner similar to the one shown in the tutorial, I created a template for the individual archives that contained nothing but a redirect built using the archive tags to point to the new location, and also a visible text link to the new location [example redirect for this page]. This template was 2 KB, so 2000 redirects would only add about 4 MB to the site. Unfortunately, I had to build these redirect pages first, so they’d be in the proper place in the old directory structure. Therefore, there would be a short period of time when the redirects would point to pages that had not yet been built in their new location.

And since I was going to rebuild the entire weblog in new directories, I decided to rework all the templates before that rebuild. I had a couple of things I wanted to accomplish. When I first converted to using CSS for all presentation and structure, I was so focused on getting the layout right that I was a bit sloppy about properly separating the content and the presentation. I wanted to strip out some of the loopy and/or redundant CSS I’d created to “make things work” the first time around. And I also wanted to improve the semantic structure and utilize CSS Image Replacement (turn off style sheets, and instead of images for PhotoDude.com and PhotoDude’s Weblog, you’ll see those words as headings). There’s no longer any coddling for non-compliant browsers (yeah, I’m talkin’ to you, Netscape 4.x), but my aren’t those semantics pretty without that hard-to-understand CSS! [cough]

With fresh new templates to rebuild, I decided to also test my template code for the meta refresh tag by hiding it within a comment in the head of the newly redesigned documents. Now that I had nice clean templates and tested redirect code, I was ready to replace each of the templates (individual, weekly, and category archives) with their corresponding redirect template. After, of course, a major league backup by exporting the weblog via MT, and downloading all the current working archives via FTP (so, worst case, I could put them back in place while I picked up the pieces).

[Now we reach the point where Reid talks in the past tense about things he hasn’t yet done … call it “previsualizing success”]

After sacrificing a goat (easier to find than a virgin) at the Altar of the Server Gods, I opened up four browser windows; an edit window of one weblog entry, and the individual, weekly, and category archive pages for that entry on my site. I replaced the templates with the redirect templates, and saved them, but did not rebuild the entire site. Instead, I made a simple change in that entry I had open (I usually just change the punctuation), and saved it to rebuild just that entry. Then I refreshed those three archive pages I had open to see if the redirect worked, and pointed to the proper URL (and at that time, generating a “404 – Page Not Found”). Doing it this way allows you to catch any template error before you propagate it by rebuilding all 2000 pages.

Satisfied that my redirect templates were functioning properly, I rebuilt the individual, weekly, and category archives. Once I’d done that, I then altered the archive directory structure to the new scheme, and put the original templates back in place. Rebuilding the site again then put the final destination pages in the place that the redirect pages were pointing to.

At that point, I just FTP’ed the entire old directory structure containing the redirects to my hard drive, ready to upload at the new web host. When I import my weblog(s) into the new MT install there, I’ll use the same date/title archive scheme to build the new directory structure like it now exists here. Any link anyone has given a page in my weblog will redirect to the proper entry

Yes, you might say it’s a lot of effort to accomplish that. But I do it for you, so you’ll always know, “When you link PhotoDude, it sticks.”


Peanut Gallery

1  PhotoDude wrote:

[Now we reach the point where Reid talks in the past tense about things he hasn't yet done ... call it 'previsualizing success'] Well, never let anyone tell you that obsessive compulsive paranoid preparation isn't worth the effort. The heavy lifting described after the above sentence took less than an hour (not including the goat sacrifice), and was pleasantly trouble free. As best I can tell, it was a complete success, and I'm now able to move to a new host without trashing every link ever made to this site. Now, all I have to do is move thousands of HTML documents and images accumulated over 7.5 years, totaling over 290 megabytes, and reconstruct four different weblogs (and their archives) on the new MySQL/PHP-capable server at the new host.

2  Richard wrote:

Alas, it is not a *complete* success: going through old links to your weblog entries and changing them to the new ones manually (without really needing to, but it's Saturday night and I don't have a date), it turns out that the links to individual comments--that is, anchors within the pages with the old URLs--are no longer valid. But you can still claim this as a success, and I congratulate you on it. Oh, and the magic of mod_rewrite will future-proof the extensions of your URLs. Mark Pilgrim--a college graduate and not a 6th-grader, it turns out--has written up a tutorial on how to do so.

3  PhotoDude wrote:

Well, Richard, at least you allowed me a full 60 minutes to revel in a false sense of total success. Yes, links to the anchors within those old URL's won't redirect. And I've done a bit of comment linking within entries myself. Let's see, maybe I could create a redirect page for each of the over 3,000 comments, to add to the 2,000 redirect pages I built tonight ... Naahhhh. OK, OK, so I get a B instead of an A. Strip me of my purist stripes. As for the powers of mod_rewrite, I have heard of them, but they are not available to me on my current server. Nor is MySQL, and they only upgraded from PHP 3 to 4 about 4 months ago. Soon, I will have all that and more. And this is the small price I pay. I knew there would be some small level of link rot involved in this server move, no matter what I do. But this is long overdue.

4  Michele wrote:

I have the same problem but you lost me after the first paragraph. You have infinitely more patience than I. I will now be known as The Girl With Link Rot.

5  rturner wrote:

Dang that's complicated. With the caveat that in my case it wouldn't have made any difference, I might have just zipped it up into one giant tarball, hauled it up to the new server and waited for the dns changes to propagate, figuring the paths usually don't change. It seems to me, though, that you still may end up losing a post or ten as various routers point people either to the new or the old IP address. Will you have to re-import everything over several days? No, that wouldn't work either. Yikes! If your old host was smart they would've just kept enlarging your disk space as hard drives get cheaper and cheaper. They'd have customers for life at virtually no extra cost.

Comment by rturner · 09/28/03 06:17 AM
6  Beth wrote:

I did not manage to do this myself when I moved servers, but I'm pretty sure there's an easier way--moving the entire database rather than exporting and reimporting the entries. From the Movable Type forums, I found this):

Having migrated two different 15+ blog systems, I've found that phpmyadmin (web browser) or mysqldump (ssh command line) make it very easy to pick up a whole system of blogs and move them.
(I was unaware of these issues when I moved my blog, and of course all my links were broken.) Good luck!

Comment by Beth · 09/28/03 06:18 AM
7  PhotoDude wrote:

Michele: If you need a primer, let me know, and I'll go ... real ... slow. It's not quite as complex as it sounds. Heck, I didn't botch it, so it can't be that hard. Richard: Tarball and haul would get the files in place, but would not keep them within the MT structure ... they would not generate 404's, but I wouldn't be able to access or rebuild them through MT interface, either. "If your old host was smart they would've just kept enlarging your disk space as hard drives get cheaper and cheaper. They'd have customers for life at virtually no extra cost" Please, stop hurting me. And I think you're doing it intentionally. You know my web host, Earthlink, has not upgraded one iota, one pixel, nor one electron of their web hosting packages in 2.5 years, which on the Internet qualifies as a decade. I have the exact same amount of space I did in February, 2000, and the option of adding more at a rate of $4.95 per month for 5 megabytes. Yes, though I'm now paying $29.95 a month for 300 MB, if I want add a mere 50 MB to that, it will cost me an additional $49.50 per month. Now that's a well thought out price structure. Meanwhile, I can get more than twice the web space (and other offerings) for $24.95 a month elsewhere. Double the space, plus MySQL, for 20% less cost at a web host with a nearly six year long track record and 9,000 customers (PixelPile.org has been hosted there for two years). Duh! I am so so gone. I even e-mailed Earthlink Web Hosting two months ago, a last gasp effort to see if they had plans to upgrade in the near future. They literally did not want to talk about their future offerings. After 7.5 years with them. Beth, moving the database is indeed a viable option with MySQL ... which I don't currently have. I've read in the MT forums about people who tried moving their FileDB database (which is all I have), and variants between the setup of the two servers can screw the whole pooch. And I wanted a solution I could implement on the current server, before moving into a relatively unknown environment with minimal expertise in MySQL-PHP. Once again, I must blame all of this extra effort on my web host (if only I could invoice them for it). If they'd simply done an even mediocre job of keeping up with the times, I wouldn't be in this spot. Would've gotten more than six hours of sleep last night, too.

8  rturner wrote:

After what you've been through, rubbing it in didn't really occur to me, but I'm sorry. It was more marveling at Earthlink's complete and utter stupidity. Web hosting used to be the gem in Mindspring's bag-o-profits. "Meanwhile, I can get more than twice the web space (and other offerings) for $24.95 a month elsewhere. Double the space, plus MySQL, for 20% less cost at a web host with a six year track record and 9,000 customers. Duh!" I keep looking at them and Pair keeps upping my disk space and bandwidth. Still, I'm paying $49/mo for your new host's $35/mo plan. Fear of Moving is one factor for my status quo. Fear of Moving several clients who are sitting in subdirectories of my main account and paying me "rent" is another. Just in case MT isn't quite enough to whet your appetite for the Dark Side, I notice that your new host will allow you to SSH in to your heart's content using PuTTY. No pressure, though.

Comment by rturner · 09/28/03 06:19 AM
Comments are closed for this article
Contact me to find out more