Stop Internet Censorship

November 18, 2011

Near-instant takedown of sites without any due process is a scary, scary thought. Please contact Congress to stop the “Stop Online Piracy Act.”

http://americancensorship.org/

The Age of Electronicus record cover

In the window of a rare vinyl store in Hollywood Boulevard, on flickr, by Rich_Lem

In October I ventured to three locations in Mississippi with a coworker to deliver records management training to municipal clerks. My portion of the training addressed electronic records in the state. Here I discuss strategies I used and share some thoughts on teaching what is frequently dry material for an (often reluctant) audience.

Background

A little context on government records in Mississippi: for local government, all electronic records are managed and maintained by the originating agency. If electronic records are scheduled as permanent, they’re kept with that agency forever — they don’t go to the state archives.

By contrast, there are two primary supporting resources for state agencies. The first is a tape backup service offered by us, as well as the ability to take their permanent electronic records into the state archives. The second is the counsel, services, and guidelines of the state IT department. Local government of course has our counsel with any of their records concerns, but we don’t offer any services to them.

Because few municipalities (if any) have the resources to employ a records manager, it’s not atypical for electronic records management to be distributed among all municipal employees in an ad-hoc and uncoordinated manner. Professional document or records management software is out of scope for most, since such packages are too expensive and the volume of electronic records produced is typically too low to consider the purchase. The same is true of email archiving services. Open source would appear to be ideal but those solutions really do require dedicated IT administration, which is limited for many municipal projects.

My portion of the workshop lasts an hour, and the goal was to give attendees the knowledge to manage their electronic records better than they do now. Outside of the constraint that all record management has to occur with the agency, there are a few other hurdles to teaching effectively in this hour:

  • Little foreknowledge of each municipality’s specific tech setup or electronic records management strategy.
  • Little foreknowledge of each attendees’ computer literacy.
  • No foreknowledge of attendees’ specifics jobs or the records they regularly handle.

Unfortunately these constraints were outside of my control. However as I hope to share this doesn’t mean the hour can’t be successful.

Method

I broke the material down into steps, such as setting public records apart from transitory or personal documents, basic storage strategies, security best practices, and so on. For the steps where it’s applicable, I made a list of computer skills one would need to execute that step. For example in setting public records apart from non-records, necessary skills are creating folders, naming and renaming folders, and moving files and folders, among others.

I didn’t create these skill lists until the second presentation, when it struck me that attendees may be overwhelmed or discouraged by a long hour of “you should do this” and “you need to do this.” Some of the skills I’ve listed may seem basic but I felt it was important to try to empower the audience, in this case by indicating the specific (as specific as I can be in this case) skill needed.

Records on trees

Records grow on trees, on flickr, by erikadotnet

When I asked for a show of hands on these I had anywhere from a handful of attendees confirming their ability to a majority affirming they knew how to perform the task. There’s a risk that those who don’t know how to do something will feel bad and decide to reject or tune out the material. I always state however that it’s okay if one doesn’t have these skills — they don’t just fall out of heaven and into your head, so it’s alright to ask someone.

Still, I’m actively reconsidering whether I should ask for a show of hands. I want to engage the audience but I don’t actually do anything with that information, so it’s not like any thing is hinging on the feedback.

After the first workshop I decided my goal was to have attendees feel like comprehensive and impressive electronic records management was within their grasp, with essentially all the knowledge and tools they already have right at hand. Even if they don’t have a listed computer skill (such as saving emails from their email client or service to a specific folder, or creating a shared drive or folder), they know specifically what they should learn in order to achieve records management. This leads to some presentation aspects.

Presenting

This is dry subject matter for most, so keeping audience attention is really the top priority and the most difficult task, especially since I present in the third hour. On the final workshop I opened by saying “My goal this hour is to bore you to death with electronic records management.” It actually got some laughs.

Establishing some kind of rapport with the audience is critical, and something I’m still working on. I ask for stories of hard drives crashing, and I tell my own. I try to tell a good story about a failed security measure. For example, the (likely) Stuxnet attack on the Iranian Natanz nuclear facilities in 2010 is speculated to have been delivered by a USB drive. This emphasizes that basic measures like good passwords and ensuring appropriate physical access are invaluable in protecting data and systems. I’m still fishing around for something more local.

Amy Rudersdorf at NCDCR.gov, who I met at the Digital Preservation Outreach and Education (DPOE) beta workshop, mentioned passing around a bag of dead electronic media. We have an 8″ floppy we show, and I’d like to collect other media for the next round.

I also emphasize that a lot of the material covered, like creating a backup plan, basic data protection, etc., are just as applicable to their personal data as their work data — protecting their digital photos, their email and their documents. While there are different issues in managing electronic public records versus electronic personal data — especially when it comes to cloud storage — there is more alike than not in my opinion.

On that note, one activity may be to ask audience members who has the oldest media at home. Does anyone have music on audio cassette or vinyl? Store data on floppies or Zip disks? Video or print on any kind of film? None of that is bad of course, but it could help to get them to consider the how long that material can continue to be used, and if it’s at all possible to make a copy.

broken cassettes

obsolete media, on flickr, by EssG

Another activity is to have the presentation laptop shared with a volunteer. Minimize the presentation software and have an example file or two on the desktop. Let the volunteer make a folder on the desktop and copy a file into the folder. Now, have the volunteer move a second file into the folder (thus maintaining only one copy of the file), then rename the folder, and finally then put the renamed folder in a new folder they create. On the face of it this seems awfully pedestrian but it allows someone to demonstrate their ability, and if there are others in the crowd who are unable to do these tasks, it will illustrate how they are done.

Just to emphasize, other audiences may be more technically advanced than what I’m covering here. They may have access to, or be able to consider, professional management software, or they may have access to a considerable IT force.

The intention for my audience is to have attendees know that organized use of basic computer abilities is all it takes to competently manage electronic records in full compliance of state law. That is, minimum requirements will be met. I do not go into hashes and I only touch briefly on audit trails (in the context of a legal hold, or in advising use of the author and organization fields in Word, Adobe, etc.). There is not enough time to cover more detailed topics, as this is a 101 session.

Conclusion

Electronic records training can be delivered to a novice audience, even if you don’t know as much about them as you would like. My two main strategies here were:

  • Breaking down electronic records management into steps (setting records aside, creating a storage strategy, writing a policy, etc.).
  • Providing specific computer skills for each step where applicable.
  • Demonstrate these skills for the audience where you can.

DPOE National Calendar

October 4, 2011

I want to give a brief shout out to the DPOE National Calendar, brand spanking new as of June 2011.

The idea is to have a single, general purpose calendar that covers digital preservation workshops, talks, etc., across the country. If you’re giving a talk or workshop, no matter how small the audience, consider submitting it here. And of course you can check the calendar to attend events, whether online or local to your area.

A longer post on DPOE is still forthcoming.

Next week I’ll be attending a train-the-trainer workshop hosted by the Library of Congress in D.C. I’m thrilled to be attending and I’m really looking forward to meeting the other participants.

The Digital Preservation Outreach and Education (DPOE) program is a recent initiative by LOC to “foster national outreach and education to encourage individuals and organizations to actively preserve their digital content.”

Since attendees are coming from a variety of institutions, it’s going to be really interesting to discuss the different contexts in which digital preservation can be introduced. Audiences and clients can make a big difference in how you articulate a subject – and identifying the core issues within those variations is a (perhaps lofty) goal of mine for this workshop.

That, as well as feedback on training and workshop execution, of which my position requires a good deal of, cannot be too welcome!

I hope to have a post or two on the workshop during or shortly after.

Lulu glares out at you from a sea of compression artifacts.

During my last two years of high school (c. 1998 to mid-2000), a friend and I began an “art website.” Our intention was to have a place to post our writings and visual work, and to solicit similar submissions from others around Internet. Our mascot was the enraged chimp pictured above, Lulu. The site received some modest interest from various users around the Web, and from a few of our friends at school. All in all, not an unsuccessful project.

However as high school came to a close, we grew tired of maintaining the site. We tossed around the idea of maintaining it while we went to our respective colleges, but eventually decided to shut it down. We would be too busy with better endeavors, and no one wanted to log into our hosting service to keep  the old high school art site afloat. We posted the EOL announcement on the site and applied a bullet wound to old Lulu with MS Paint. It was most certainly over.

Read the rest of this entry »

Dwarf Fortress screen

Before the week is out I wanted to post to the NYT interview with the Adams brothers, who design and build the incredible labor of love that is Dwarf Fortress.

I had the opportunity to interview Tarn Adams (audio and transcript available), who programs the game, for the game preservation project I worked on in school (all interviews are here at the Center for American History). Tarn is a standout guy, who is awfully generous with his time considering the colossal task ahead of he and his brother. He gave a great interview that illuminated important parts of their game-making, which is in kind with the idiosyncratic and singular quality of Dwarf Fortress.

Check out the NYT interview — Tarn has some thoughtful, provoking comments on playing games these days.

And, if you haven’t tried Dwarf Fortress, give it a go, eh? I played it for a year on and off – one day I’ll make a return to it. It’s not as hard as all that, really, especially when you jettison the ideas of “winning” … or “succeeding.”

As some may know, I have recently been hired at the Mississippi Department of Archives and History. It’s the end of my third week here and it’s been great: great coworkers, interesting materials (I’m working with electronic records in the government section) – and I’ve been able to overhear the back-and-forth on a recently completed project which saw us transitioning from one online catalog to another. It was successful – do not worry.

But for anyone who has worked in IT at all ever, or just had to approach a problematic system and put out the fire may appreciate the elegance of this statement.

Systems that are both tightly coupled and highly complex, Perrow argues in Normal Accidents (1984), are inherently dangerous. Crudely put, high complexity in a system means that if something goes wrong it takes time to work out what has happened and to act appropriately. Tight coupling means that one doesn’t have that time. Moreover, he suggests, a tightly coupled system needs centralised management, but a highly complex system can’t be managed effectively in a centralised way because we simply don’t understand it well enough; therefore its organisation must be decentralised. Systems that combine tight coupling with high complexity are an organisational contradiction, Perrow argues: they are ‘a kind of Pushmepullyou out of the Doctor Dolittle stories (a beast with heads at both ends that wanted to go in both directions at once)’.

That is poetry. From Donald MacKenzie in the London Review of Books (link added). Itself from a post covering Amazon’s April 2011 cloud outage at David Rosenthal’s always-excellent digital preservation blog.

Green Fire

Below is a review of Derrida’s Archive Fever. The idea was to relate the lecture to practicing archivists and record managers. This was a really engaging read, and I think Derrida successfully articulates the archive impulse, with all its attendant richness and strangeness.

Archive Fever: A Freudian Impression. Jacques Derrida. Chicago: University of Chicago Press, 1998. Translated by Eric Prenowitz. 113 pages. ISBN 0-226-14367-8 paper. $14.98.

French philosopher Jacques Derrida (1930-2004) is most commonly known as the founder of deconstruction, an investigative thinking that identifies contradictions in a subject and demonstrates the essentialness of this contradiction to the meaning of the subject. For a thinker so adept at analyzing the valences of meaning in language, Derrida was unsurprisingly hesitant about the broad appeal and use of the deconstruction term, and no doubt would find fault with an overly mechanistic summation as perhaps written here. In Archive Fever, Derrida applies his intensely critical thought and evaluation to the notion of the archive as it is manifested in Sigmund Freud’s oeuvre.

Archive Fever: A Freudian Impression is a translation from the French of a published lecture Derrida delivered in 1994, and is divided into six parts: an opening note, an exergue, a preamble, foreword, theses and postscript. Derrida delivered this lecture to an international colloquium entitled “Memory: The Question of the Archives.” This leads to two caveats for the interested reader. Although blurbs on the paperback reference Derrida’s discussion of electronic media and more broadly the role of inscription technology in the psyche and in the archives, this is not the focus of his discussion, but is only part of a larger examination of the archive notion in Freud’s works. The reader should also know that this a later work of Derrida, and as such references ideas and investigations discussed in earlier works, particularly the essay Freud and the Scene of Writing (1972). This means some of Derrida’s passages can be disorienting if the reader is not familiar with the works of Derrida and Freud. Thankfully Derrida takes pains to convey his meaning through multiple expressions, so the reader has many opportunities to understand the ideas at play.

Read the rest of this entry »

Lately I’ve been working to put in more development time with the Fedora repository at the Goodwill Computer Museum.

A PHP ingest interface we’ve set up is certainly the most developed of the our repository’s services, but there’s a strong need to relate one object to another as it is being ingested. To do this I want to provide the user with a drop down menu of objects in the repository which fulfill some criteria (say, the object represents a donator or creator). The user can select one during the ingest phase, relating the ingested object to this other object. That relationship would be recorded in the RELS-EXT datastream as RDF/XML, creating a triple. The predicate of that triple will come from either Fedora’s own ontology [RDF schema] or another appropriate namespace.

Below is PHP code using the cURL client library to call Fedora’s REST API and get this list of relevant objects. I encountered a few stumbling blocks putting this together, so I thought I’d share in case others were curious or looking at a similar problem.

The first step is to compose your query, and then initiate a cURL session with the query.


<?php
$request = "http://your.address.domain:port/fedora/objects?query=yourQuery&resultFormat=xml";
$session = curl_init($request);

curl_setopt($session, CURLOPT_RETURNTRANSFER, true);

$response = curl_exec($session);
$responseResult = simplexml_load_string($response);
$resultsArray = array();							
     
foreach ($responseResult->{'resultList'} as $result) {			
     foreach ($result->{'objectFields'} as $entry) {
          foreach ($entry as $value) {
               $resultsArray[] = $value;	
          }						
     }		
}
curl_close($session);

while (!empty($token)) {				
     $nextQuery = "http://your.address.domain:port/fedora/objects?sessionToken=" . urlencode($token) . "&query=yourQuery&resultFormat=xml";				
     $nextSession = curl_init($nextQuery);				

     curl_setopt($nextSession, CURLOPT_RETURNTRANSFER, true);

     $nextResponse = curl_exec($nextSession);
     $nextResponseResult = simplexml_load_string($nextResponse);		  		

     foreach ($nextResponseResult->{'resultList'} as $result) {			
          foreach ($result->{'objectFields'} as $entry) {				
               foreach ($entry as $value) {					
                    $resultsArray[] = $value;						
               }			
          }				
     $token = $nextResponseResult->{'listSession'}->{'token'};				
     print "$token<br />\n";

     curl_close($nextSession);

} //while
?>

On line 2 I’ve specified my query results to be returned as XML and not HTML (resultFormat=xml). This is because I don’t want a simple browser view of the results — I want to work with them some first, so XML is appropriate.

On line 5 the cURL option CURLOPT_RETURNTRANSFER to ‘true’. This directs cURL to deliver the return of its Fedora query as a string return value to the curl_exec() variable, in this case $response.

On line 8 $response, now an XML structure, is loaded into $responseResult as a PHP5 object. The object is a tree structure containing arrays for the result list, the entries, and the entries’ value arrays, all of which we can work through to get to the record values of interest. The specific contents will depend on your query. You can get a good look at the object with print_r():

print_r($responseResult);

The two Fedora REST commands used are findObjects and resumeFindObjects. We need both of these commands because findObjects will not return more than 100 results, regardless the value you set on maxResults.

Instead it returns the results along with a token. This token is a long-ish string you can then supply to resumeFindObjects, which will continue retrieving your results for you. Just like findObjects, resumeFindObjects will never return more than 100 results, instead giving you another unique token. Once again, you can supply that token to a new resumeFindObjects command to continue getting your results.

The two loops for each of these commands should fill resultsArray[] with all the results available in the repository.

You can use this array in a HTML drop down:


<?php
echo "<select name=\"donators\">";
foreach ($responseResult->{'resultList'} as $result) {
	foreach ($result->{'objectFields'} as $entry) {
		$pid    = (string) $entry->pid;
		$title  = (string) $entry->title;				
		echo "<option value=\"$pid\">$title</option>";
	}	
}
echo "</select>";
?>

Keep in mind that values like $entry->pid and $entry->title are only going to be in the results if those fields have been requested in your queries.

This approach has given me a good understanding of calling and manipulating objects in Fedora through PHP. I have found that setting maxResults to a smaller number (say 5, 10, or 20) is faster than setting it to its maximum 100. And of course, if you are going to be fetching hundreds or thousands of objects, it’s best not to dump them all in a drop down or to fetch them all at once.

Follow

Get every new post delivered to your Inbox.