Old Site Exhumed, Mostly Gone

by wsampson

Lulu glares out from a sea of compression artifacts.

During my last two years of high school (c. 1998 to mid-2000), a friend and I began an “art website.” Our intention was to have a place to post our writings and visual work, and to solicit similar submissions from others around Internet. Our mascot was the enraged chimp pictured above, Lulu. The site received some modest interest from various users around the Web, and from a few of our friends at school. All in all, not an unsuccessful project.

However as high school came to a close, we grew tired of maintaining the site. We tossed around the idea of maintaining it while we went to our respective colleges, but eventually decided to shut it down. We would be too busy with better endeavors, and no one wanted to log into our hosting service to keep  the old high school art site afloat. We posted the EOL announcement on the site and applied a bullet wound to old Lulu with MS Paint. It was most certainly over.

Lost

Our site was hosted with the free service provided by Juno in conjunction with Homestead. This was like a much smaller version of the web hosting provided by Geocities. Readers may recall Geocities unexpectedly cropping up in the news when Yahoo decided to shutdown the service. Thanks in huge part to Jason Scott and the Archive Team’s tireless work and campaigning, an enormous amount of Geocities has been preserved in the Internet Archive as well as within numerous other archives.

In retrospect, that announcement of termination was a sort of a blessing: there wasn’t time to erect a perfect archiving project, but preservation did happen, and how.

I’m not aware of any similar announcement for Homestead or Juno, but I if anyone knows of such an announcement, or better yet, any archiving effort for those services, I would love to know. Homestead is still kicking, but the sites it now hosts appear to be commercial. Juno is around, though hosting doesn’t appear to be part of the package anymore.

(Partially) Found

So despite my friend and I doing nothing to save our old high school site, and despite no notice that it would be removed, it has miraculously weathered about a dozen years of benign neglect. Scratch that — not-so-benign neglect, given that Juno/Homestead actively pulled the plug on it upon the closure of my account with them – and the site was never resident on either of our computers. How could this have happened?

Well, the Internet Archive is how that happened.

Now, the Internet Archive does not yet provide full text searching of its archive – you need to know the URL. Through some Google sleuthing I was finally able to recover this through an old submitter’s profile page (from a different, unrelated website) who still had our own site (about eleven years in the grave at this point) listed as their homepage. This person had submitted a lot of work to our site so it makes sense that she might point to it as the best place to find her on the web of 1999.

I typed in that address from the Internet Archive and lo, the last year or so of the site had been collected several times by the Archive’s Heritrix crawler. At these points of capture we had already announced the site’s moratorium on submissions and updates, so none of the snapshots show any change in the site. All the same I nearly laughed out loud just from the delight and surprise of seeing our old site again.

Remnants

Despite those wonderful, wonderful snapshots, browsing the site now is like navigating a deeply decayed house that’s bordering on final collapse. The majority of the internal links are broken – for whatever reason, the crawler never got those destinations, or was unable to map those resources to the site. Most of the external links pointing out to the Web of 1999 have rotted, though that is not terribly surprising.

No submitted artwork remains — it’s all broken JPEGs as far as you can see. A few thumbnails from the art index page are what remains of the submitted visual content. Again, perhaps the crawler had access troubles or could not map those resources to the site. Maybe they’re resident in a WARC somewhere. Other site artifacts that relied on external resources, such as the numerous polls we conducted, are empty. Our Message Board remains intact but the Guestbook goes nowhere.

(Yes, sites used to have “guestbooks.” You would sign in and say what you think of the site, etc., like a guestbook in real life. So quaint, I know.)

For the time being I’m downloading the pages one by one. When Warrick has adjusted to the Internet Archive’s recent changes, I’ll try and avail myself of their services.

All said, and as impossibly painful as it is to read my (or anyone’s) high school poetry and prose, I’m happy to have what remains of this site back. This was my first web site after all. So, thank you Internet Archive.

And yet my takeaway here is that the Archive will not save you. As wonderful as it is to see this site again, the fact is that most of it is gone, and that’s a shame. The experience of browsing it is severely diminished. Getting what I have now was laborious, and to get anymore, I have to rely on a very slim chance afforded by a great tool, Warrick, which is nevertheless completely out of my control.

Compare this to my old student site, which I have preserved easily, just by moving the site directory from UT’s servers to my Dropbox account (and of course, on to my hard drive as well as a backup). This site has been retained almost perfectly within a few minutes, while my high school site is mostly destroyed. Preserving what’s left takes hours, with some considerable scrubbing to re-realize the original HTML.

Lesson the One: Preserving digital content earlier is much, much easier — and so much cheaper — than trying to come in after years of neglect and perform heroic acts of digital archeology.

Lesson the Two: Be careful of slowly fading services and resources. There wasn’t an announcement or notice that our site would go away. A lot of your digital content on hosted services are going to end with a whimper. I think this will get better — especially with major services like Facebook and Twitter, where the legacy of a deceased person’s data has come up notably several times, and for which policies and download services are in place. Still, I maintain that a great deal of digital data hosted by services is likely to quietly slink away from your reach.

Lesson the Three: Preserving web sites and the Web at large is really, really difficult. Arbitrary and dynamic content, multiple and unpredictable resource locations, many parts, many dependencies, and behavioral sophistication. It’s a tough list to say the least, so it can’t hurt to be a proactive about maintaining your content.

[Edit 07/22/2011] Lesson the Four: Digital objects are in danger of partial or damaged states just as physical objects are. This is something that has struck me as somewhat unintuitive for those unfamiliar with digital objects. A common experience is having a hard drive crash or accidentally deleting a file, and this can lead to the belief that digital objects are either preserved entirely or they’re gone entirely. This corresponds with the notion of digital machines, which work in discrete and discontinuous values.

But the reality is that preservation of anything but the most simple data can be partial and incomplete, just as my old web site here, or a video game, or a work of digital art, or a spreadsheet that lost the underlying formula, etc. Certain properties persist (with varying degrees of fidelity to the original) while others become lost. I think this is a particularly difficult component of digital preservation to explain to those outside the practice as all the logical or behavioral properties of an object are not apparent on first glance.

In any case, something to consider the next time you compose your elevator speech.

Advertisements