Walker Sampson

walker [dot] sampson [at] icloud [dot] com

Category: digital media

“Aggregating Temporal Forensic Data Across Archival Digital Media”

Last year I attended the Digital Heritage 2015 conference and presented a paper on digital forensics in the archive. The paper centers on collecting file timestamps across floppy disks into a single timeline to increase intellectual control over the material and to explore the utility of such a timeline for a researcher using the collection.

As I state in the paper, temporal forensic data likely constitutes the majority of forensic information acquired in archival settings, and in most cases this information is gathered inherently through the generation of a disk image  While we may expect further use of this data as disk images make their way to researchers as archival objects (and the community’s software, institutional policies and user expectations grow to support it), it is not too soon to explore how temporal forensic data can be used to support discovery and description, particularly in the case of collections with a significant number of digital media.

Many thanks to the organizers of Digital Heritage 2015 for the support and feedback; it was a wonderful and very wide-reaching conference.

Aggregating Temporal Forensic Data Across Archival Digital Media (IEEEXplore) (CU Scholar)

Advertisements

Repercussions of Amassed Data

I had the pleasure of meeting Mél Hogan while she was doing her postdoctoral work at CU Boulder. I think her research area is vital, though it’s difficult to summarize. But that won’t stop me, so here goes: investigating how one can “account for the ways in which the perceived immateriality and weightlessness of our data is in fact with immense humanistic, environmental, political, and ethical repercussions” (The Archive as Dumpster).

Data flows and water woes: The Utah Data Center is a good entry point for this line of inquiry. The article explores the above quoted concerns (humanistic, environmental, political, and ethical) at the NSA’s Utah Data Center, near Bluffdale. It has suffered outages and other operational setbacks since construction. These initial failures are themselves illuminating, but even assuming such disruptions are minimized in the future, the following excerpt clarifies a few of the material constraints of the effort:

Once restored, the expected yearly maintenance bill, including water, is to be $20 million (Berkes, 2013). According to The Salt Lake Tribune, Bluffdale struck a deal with the NSA, which remains in effect until 2021; the city sold water at rates below the state average in exchange for the promise of economic growth that the new waterlines paid for by the NSA would purportedly bring to the area (Carlisle, 2014; McMillan, 2014). The volume of water required to propel the surveillance machine also invariably points to the center’s infrastructural precarity. Not only is this kind of water consumption unsustainable, but the NSA’s dependence on it renders its facilities vulnerable at a juncture at which the digital, ephemeral, and cloud-like qualities are literally brought back down to earth. Because the Utah Data Center plans to draw on water provided by the Jordan Valley River Conservancy District, activists hope that a state law can be passed banning this partnership (Wolverton, 2014), thus disabling the center’s activities.

As hinted at in a previous post on Lanier, I often encounter a sort of breathlessness invoked when descriptions of cloud-based reserves of data and computational prowess are discussed. Reflecting on the material conditions of these operations, as well as their inevitable failures and inefficiencies (e.g. the apparently beleaguered Twitter archive at the Library of Congress, though I would be more interested in learning about the constraints and stratagems of private operations) is a wise counterbalance that can help refocus discussions on the humanistic repercussions of such operations. And to be sure, I would not exclude archives from that scrutiny.

Disk Imaging Workflow at BitCurator.net

Early in January I attended the first-ever BitCurator Users Forum in Chapel Hill. This was a fantastic day with a group of folks interested in the BitCurator project and digital forensics in an archive setting — definitely one of the most information-packed and directly applicable conferences or forums I’ve attended. I’m very much looking forward to next year’s.

I have a post on the BitCurator site on the disk imaging workflow I’m using with students presently, and there’s a great wrap-up of the day as well.

Checksumming till the cows come home

Jon Ippolito, from an interview with Trevor Owens at The Signal:

Two files with different passages of 1s and 0s automatically have different checksums but may still offer the same experience; for example, two copies of a digitized film may differ by a few frames but look identical to the human eye. The point of digitizing a Stanley Kubrick film isn’t to create a new mathematical artifact with its own unchanging properties, but to capture for future generations the experience us old timers had of watching his cinematic genius in celluloid. As a custodian of culture, my job isn’t to ensure my DVD of A Clockwork Orange is faithful to some technician’s choices when digitizing the film; it’s to ensure it’s faithful to Kubrick’s choices as a filmmaker.

Further:

As in nearly all storage-based solutions, fixity does little to help capture context.  We can run checksums on the Riverside “King Lear” till the cows come home, and it still won’t tell us that boys played women’s parts, or that Elizabethan actors spoke with rounded vowels that sound more like a contemporary American accent than the King’s English, or how each generation of performers has drawn on the previous for inspiration. Even on a manuscript level, a checksum will only validate one of many variations of a text that was in reality constantly mutating and evolving.

In my own preoccupation with disk imaging, generating checksums and storing them on servers, I forget that at best this is the very beginning of preservation; not an incontestable “ground truth” of the artifact.

Simulation Fever

From Persuasive Games by Ian Bogost:

Previously, I have argued that videogames represent in the gap between procedural representation and individual subjectivity. The disparity between the simulation and the player’s understanding of the source system it models creates a crisis in the player; I named this crisis simulation fever, a madness through which an interrogation of the rules that drive both systems begins. The vertigo of this fever — one gets simsick as he might get seasick — motivates criticism.

Procedural rhetoric also produces simulation fever. It motivates a player to address the logic of a situation in general, and the point at which it breaks and gives way to a new situation in particular.

Born Digital and Probably Died that Way: Content Loss from Yesteryear

Greetings from QBasic - Wish You Were Here!

Greetings from QBasic – Wish You Were Here!

I’m often asked – in the course of my job or by an acquaintance – to explain ‘digital preservation’ and what I mean by it. And as I’m sure others in this field know, a frequent first guess is scanning – you’re scanning stuff, right?

It’s a reasonable and valid guess – digitization can and is used as a preservation strategy – but it’s a reply that leaves me stumbling, “Yes, but…” as it’s the born-digital content that is most likely to be overlooked for a newcomer.

I’m often tongue tied though to explain why born-digital material is important at a personal level for an individual. To some it seems immediately frivolous – perhaps resulting from a notion that the digital enterprise is inherently ephemeral, or that the ‘information superhighway’ – a dated term but one still with a legacy – is just a media-carrying superstructure over the real stuff.

Not having someone immediately agree with your assumptions startles you into explanation mode. So I reach for a personal example of born-digital vitality. But the truth is that in my recent past I’ve done a pretty good job of preserving the digital materials that are important to me. Setting up a reasonably safe (and this is key: automated) backup routine and checking media health every once in a while goes a long way. So I have no woeful narrative to relate there about personal digital material becoming lost (yet).

And as I’ve mentioned elsewhere in this blog, I find myself agreeing with David Rosenthal’s research that suggests file format obsolescence in a post-Internet world is not a major risk for the majority of digital materials. So I don’t feel terribly relevant trying to spook someone with the scenario of their Microsoft Word files becoming obsolete in a few years. They are far more likely to become lost through neglect before approaching obsolescence.

So I searched back through my own personal history to think of what born-digital content I have lost to time. Not just any old content that happened to be lost, but something that means a lot to me but is simply no more.

Now I’ve visited a near-loss and partial recovery with a high school art web site, so I recall here a complete content loss. Nothing remains but the recollection. This loss still smarts today – the code for my QBasic games. Hear my tale of woe, as I recreate here whatever will be left of those projects.

My kingdom for some GOTO code

When my family first purchased a computer, it took a few years for me to learn the ropes on it. I recall some unintended directory deletions while I was learning DOS, and at one point I thought I had truly broken the system through one of these errant deletes. The incident was only a mistakenly relocated set of files that broke a start-up routine, but it was not without its moments of vertigo that I had broken the family machine.

Eventually I got to understand command line customs, along with the basics of programming in the QBasic IDE, which came standard with MS-DOS and Windows for approximately nine years. Once I got the hang of basic user input and variable handling, I figured it was time to make games in QBasic.

Ah, to be young and just dive in! None of them were ever completed, though this does not bother me. I still believe just diving in is a handy practice.

Lend an ear and I’ll tell you about them.

Kingdom of Kroz

Kingdom of Kroz (1987)

ZZT

ZZT (1991)

The first effort was a fantastical text adventure with ANSI-style art inspired by the psychedelic landscapes of Kingdom of Kroz and Epic Megagames’ ZZT, but featuring the simple rules of a Choose your own Adventure novel. I got pretty far along before the tedium of hand drawing scenes row by row with the extended character set wore me down. I was still learning a lot.

The Terminator

The Terminator (1990)

Drugwars

Drugwars (1984)

The second game was identical in form, but took some less tasteful tones from Bethesda’s The Terminator title – an early stab for that studio at their now famous open-world design – as well as the Drugwars DOS game. I got even less far along than even the first game – just a couple of sequences before the player was abruptly dumped back into the sharp blue of QBasic’s IDE. I recall becoming bored and directionless at the monotone grimness the setting required, as well as the tedious, screen by screen gameplay.

Legend of the Red Dragon

Legend of the Red Dragon (1989)

The third game, and the most involved, was an RPG collaboration with an elementary school friend, very much modeled after the BBS classic Legend of the Red Dragon – but a single player affair. We had races, classes, a town, shops, NPCs, and had begun modeling the wilderness areas where the player would encounter whatever had to be fought there. However, school hedged in and the friend moved away, and our work stopped there.

I would give my right arm for the source code to any of these projects, but that last one hurts the most. My friend and I spent many hours and long nights developing the RPG – and never got very far – but this piece of digital content represents a huge investment of my enthusiasm and passion at that time. That it is utterly lost is painful. I don’t know what I could have done to have had the foresight to keep it, except to have kept the floppies around somehow by neglect. If this were a project nowadays, perhaps a forgotten email attachment could have wrought it up from the bog. Alas, at that time the only network we had was carting floppies between our houses.

There are other losses, such as my old MySpace page, which captures some of my disposition and contacts in the early college years, an embarrassing old fan site for a band I loved in high school, a lost DOOM level .wad – but the absence of this QBasic code hits strongest. This is simply how things get lost, alas – though I sigh wistfully when hearing of old game code being discovered. That someone, amazingly, has managed to create a modern game coded entirely in QBasic just makes me all the more wistful.

Citizens of tomorrow, your digital content – even if, like myself, you are not a heavy user of social media – can be profoundly important to you and very likely to others. Keep an eye on it, as I wish I had.

Speaking at NAGARA e-Records Conference 2013

Just a little post to say I’ll be speaking at the NAGARA e-Records Conference this year in Austin, Texas. I’ll be describing the efforts of CoSA’s State Electronic Records Initiative (SERI) over the past few years – specifically our educative efforts, and the upcoming electronic records training workshops this year and next. These workshops will collectively be attended by every state and territorial archives and records program in the country.

It’s quite an epic project, and one that aims to address a worrying gap in archives’ management of born-digital records – the “gap between the authority to act and the ability to act effectively” (Report of the Blue Ribbon Panel, a supplement to The State of State Records, 2007, p. 5.).

e-Records 2013 Presentation Slides

The slide links to the presentation, which is also available at NAGARA’s site.

This will be my third time attending the conference – the forum is always timely and interesting, and Austin is just icing on the conference cake.

Help the Digital Preservation Q&A at StackExchange

Stack Exchange Q&A site proposal: Digital Preservation

Join us.

I’ve recently committed to the Digital Preservation Q&A proposal at StackExchange. This is a resource I really hope  comes to fruition, as there’s a lack of sites to support exchange of strategies and advice for people involved in digital preservation, as well as to field questions from persons familiarizing themselves with the practice.

This latter audience has been on my mind particularly since leaving the DPOE program last year. Although we have fielded questions over an email listserv, this venue has a few significant weaknesses:

  • It’s difficult to bookmark or reference back to advice or information within a thread.
  • The email body and thread is not friendly to text formatting, links, and other formatting that would make information more readable, digestible and inclusive.
  •  The information is unstructured — one can not apply tags, select a topic as a favorite, vote up a discussion, or track edits in any systematic way.

By contrast, the StackExchange approach is a mix between a question-and-answer site and Wikipedia, with some reward elements to provide incentive for good contributions. There are a host of topics covered under the network, from gardening to LEGOs to electrical engineering. The network hosts an Area 51 site, which maintains all the topics proposed presently that users are interested in, but which are not yet formal sites. There’s a lot there, and you’d likely be interested in a few.

Why StackExchange? It features all the methods to structure information I described above. I really can’t imagine a better format (at least, not one already set up and sorted out) for building up a knowledge base in digital preservation, and one that can adjust with time. Digital preservation is a practice that will change immensely with time. There will be an assortment of questions and procedures, ranging from the obscure rescue efforts to large scale and contemporary migration processes.

As part of the state archives here in Mississippi, I do a good bit of training to state employees on electronic records management and preservation. Required retention periods for born-digital objects can range from three to fifteen or more years, while many are marked for permanent retention and will be deposited here at the archives. Considered planning for digital content repeatedly comes up. A single good resource to point them to would be very welcomed.

Consider committing if the topic interests you. It’s especially helpful if you’re already engaged in other StackExchange sites, and as noted there are a whole lot of topics to join, so there’s ample opportunity to get involved with StackExchange. Any interest does help!

DPOE National Calendar

I want to give a brief shout out to the DPOE National Calendar, brand spanking new as of June 2011.

The idea is to have a single, general purpose calendar that covers digital preservation workshops, talks, etc., across the country. If you’re giving a talk or workshop, no matter how small the audience, consider submitting it here. And of course you can check the calendar to attend events, whether online or local to your area.

A longer post on DPOE is still forthcoming.