Migrating Data from MySQL

by wsampson

This is a simple PHP script to write data from a CSV file into a new FOXML file for ingest into the repository.  The process has been straightforward thus far. MySQL provides a quick way to export data from a table into a makeshift CSV file, and PHP’s fgetcsv function helps parse this file into a series of arrays which can be stepped through, value by value. This script has parsed the ~1000 rows of tabular data from the Hardware table into FOXML files. These files have been ingested into Fedora through the batch ingest tool available on the Java administration client.

Having this amount of records present in the repository has helped clarify how much functionality the base instance of Fedora possesses. The default search tool is fairly capable, and runs through several prominent Dublin Core fields specified in the FOXML file. Except for some notable absences, such as model number (for which there is no analogous DC term), the hardware records are as searchable and discoverable as they are on the MySQL/PHP interface. The major deficiency thus far is the lack of services associated with the RELS-EXT datastreams defined in the FOXML files. Associated these records with images and other parts is key, so this will continue to need to be developed.

Still, I have to say it feels good to migrate some plain CSV data into something a little more marked-up and meaningful!

<?php

/**
 * Hardware CSV-to-FOXML script
 *
 * @author		Walker Sampson
 */

$row = 1;

if (($handle = fopen("hardware.csv", "r")) !== FALSE) {
    while (($data = fgetcsv($handle, ",")) !== FALSE) {

        // Replace special characters with HTML escape strings
        foreach($data as &$string) {
        	$string = htmlspecialchars($string);
        }

    	// count through the CSV array
    	$num = count($data);

        // Take old HardwareID value and use as record number
        $recordNumber = $data[0];

        // Create xml file based according row number
        $fh = fopen("/Users/user/Sites/testXML/" . $recordNumber . ".xml", 'w') or die("can't open file");

        // Write header info
        fwrite($fh, "<?xml version=\"1.0\" encoding=\"UTF-8\"?>" . "\n");
        fwrite($fh, "<foxml:digitalObject VERSION=\"1.1\" PID=\"changeme:h" . $recordNumber . "\" xmlns:foxml=\"info:fedora/fedora-system:def/foxml#\" xmlns:xsi=\"http://www.w3.org/2001/XMLSchema-instance\" xsi:schemaLocation=\"info:fedora/fedora-system:def/foxml# http://www.fedora.info/definitions/1/0/foxml1-1.xsd\">" . "\n");

		// Write Fedora object properties
		fwrite($fh, "<foxml:objectProperties>" . "\n");
    	fwrite($fh, "<foxml:property NAME=\"info:fedora/fedora-system:def/model#state\" VALUE=\"A\"/>" . "\n");
    	fwrite($fh, "<foxml:property NAME=\"info:fedora/fedora-system:def/model#label\" VALUE=\"FOXML Hardware Record for h" . $recordNumber . "\"/>" . "\n");
    	fwrite($fh, "<foxml:property NAME=\"info:fedora/fedora-system:def/model#ownerId\" VALUE=\"fedoraAdmin\"/>" . "\n");
        fwrite($fh, "</foxml:objectProperties>" . "\n");

        // Write the DC datastream
        fwrite($fh, "<foxml:datastream ID=\"DC\" STATE=\"A\" CONTROL_GROUP=\"X\" VERSIONABLE=\"true\">" . "\n");
    	fwrite($fh, "<foxml:datastreamVersion FORMAT_URI=\"http://www.openarchives.org/OAI/2.0/oai_dc/\" ID=\"DC.0\" MIMETYPE=\"text/xml\" LABEL=\"Dublin Core Record for h" . $recordNumber . "\">" . "\n");
        fwrite($fh, "<foxml:xmlContent>" . "\n");

        // TODO: confirm this line is alright
        fwrite($fh, "<oai_dc:dc xmlns:oai_dc=\"http://www.openarchives.org/OAI/2.0/oai_dc/\" xmlns:dcterms=\"http://purl.org/dc/terms/\" xmlns:gcm=\"http://xx.xx.xx.xx/terms/\" xmlns:xsi=\"http://www.w3.org/2001/XMLSchema-instance\">" . "\n");

        echo "<p> Wrote record number $row with $num fields <br /></p>\n";
        $row++;

            // Write certain array values into the new DC datastream of the FOXML file
       	    fwrite($fh, "<dcterms:type xsi:type=\"dcterms:DCMIType\">PhysicalObject</dcterms:type>" . "\n");

       		// If it was set to 'system' before, it is marked as a system now
       		if (strcmp($data[2], "system") == 0 || strcmp($data[2], "System") == 0) {
       			fwrite($fh, "<gcm:system>Y</gcm:system>" . "\n");
       		} else {
       			fwrite($fh, "<gcm:system>N</gcm:system>" . "\n");
       		}

       		// Do not need this since Fedora applies its own upon ingest
       		// fwrite($fh, "<dcterms:identifier>h"   . $recordNumber . "</dcterms:identifier>" . "\n");

       		// 'Title' is taken from the former 'make' field
       		fwrite($fh, "<dcterms:title>"         . $data[4]     . "</dcterms:title>"      . "\n");
       	    // From the former 'manufacturer' field. Ideally, this value will be reaplced by a RELS-EXT relationship to some Agent object (a company, e.g.)
       	    fwrite($fh, "<dcterms:creator>"       . $data[3]     . "</dcterms:creator>"    . "\n");
       	    fwrite($fh, "<dcterms:alternative></dcterms:alternative>". "\n");
       		// TODO: lots of scrubbing to normalize all the entries we did on the fly to the values of the 'easydeposit_metadata_hardwarecat' array in config/easydeposit.php
       		fwrite($fh, "<dcterms:type>"          . $data[2]     . "</dcterms:type>"       . "\n");
       		fwrite($fh, "<dcterms:identifier>"    . $data[6]     . "</dcterms:identifier>" . "\n");
       		fwrite($fh, "<gcm:model>"             . $data[5]     . "</gcm:model>"          . "\n");
       		fwrite($fh, "<dcterms:created>"       . $data[7]     . "</dcterms:created>"    . "\n");
		    fwrite($fh, "<dcterms:temporal xsi:type=\"dcterms:Period\"> name=\"Manufacture span\"; start=0000-00-00; end=0000-00-00;</dcterms:temporal>" . "\n");
			fwrite($fh, "<gcm:retail>"            . $data[8]     . "</gcm:retail>"         . "\n");
		    fwrite($fh, "<dcterms:format></dcterms:format>"                                . "\n");
 			fwrite($fh, "<dcterms:provenance>"    . $data[12]    . "</dcterms:provenance>" . "\n");

 			// all the Hardware record are "Gift" if it is filled in at all
 			if (strcmp($data[14], "Gift") == 0) {
 				fwrite($fh, "<dcterms:accrualMethod xsi:type=\"dcterms:Accrual\">Donation</dcterms:accrualMethod>" . "\n");
 			} else {
 				fwrite($fh, "<dcterms:accrualMethod xsi:type=\"dcterms:MethodOfAccrual\"></dcterms:accrualMethod>" . "\n");
 			}

 			// fwrite($fh, "<dcterms:accrualPolicy xsi:type=\"dcterms:URI\">" . someValue . "</dcterms:accrualPolicy>" . "\n");
 			fwrite($fh, "<dcterms:dateSubmitted>" . $data[24]    . "</dcterms:dateSubmitted>" . "\n");
 			fwrite($fh, "<dcterms:dateAccepted>"  . $data[27]    . "</dcterms:dateAccepted>"  . "\n");
			fwrite($fh, "<gcm:disposition>"       . $data[15]    . "</gcm:disposition>"       . "\n");
		    fwrite($fh, "<gcm:dispositionDate></gcm:dispositionDate>"                         . "\n");
		    fwrite($fh, "<dcterms:medium></dcterms:medium>"                                   . "\n");
  			fwrite($fh, "<gcm:restoration>"       . $data[25]    . "</gcm:restoration>"       . "\n");
  			fwrite($fh, "<gcm:functioning>"       . $data[19]    . "</gcm:functioning>"       . "\n");
  			fwrite($fh, "<gcm:functioningDate>"   . $data[20]    . "</gcm:functioningDate>"   . "\n");
  			fwrite($fh, "<gcm:storage>"           . $data[23]    . " at "
  											      . $data[22]    . "</gcm:storage>"           . "\n");
			fwrite($fh, "<gcm:partNumber></gcm:partNumber">                                   . "\n");
			fwrite($fh, "<gcm:otherNumber></gcm:otherNumber">                                 . "\n");

  			fwrite($fh, "<dcterms:description>Functioning: " . $data[21] . "</dcterms:description>" . "\n");
  			fwrite($fh, "<dcterms:description>Technical: "   . $data[10] . "</dcterms:description>" . "\n");
  		    fwrite($fh, "<dcterms:description>Appearance: </dcterms:description>"                   . "\n");
  			fwrite($fh, "<dcterms:description>Condition: "   . $data[17] . "</dcterms:description>" . "\n");

 		fwrite($fh, "</oai_dc:dc>" . "\n");
      	fwrite($fh, "</foxml:xmlContent>" . "\n");
      	fwrite($fh, "</foxml:datastreamVersion>" . "\n");
      	fwrite($fh, "</foxml:datastream>" . "\n");

        // Write the RELS-EXT datastream to point to collection:hardware
        fwrite($fh, "<foxml:datastream ID=\"RELS-EXT\" STATE=\"A\" CONTROL_GROUP=\"X\" VERSIONABLE=\"true\">" . "\n");
        fwrite($fh, "<foxml:datastreamVersion ID=\"RELS-EXT.0\" LABEL=\"Relationships to other objects\" MIMETYPE=\"text/xml\">" . "\n");
        fwrite($fh, "<foxml:xmlContent>" . "\n");
        	fwrite($fh, "<rdf:RDF xmlns:rdf=\"http://www.w3.org/1999/02/22-rdf-syntax-ns#\">" . "\n");
        	fwrite($fh, "<rdf:Description rdf:about=\"info:fedora/changeme:h" . $recordNumber . "\">" . "\n");
        	fwrite($fh, "<rel:isMemberOf xmlns:rel=\"info:fedora/fedora-system:def/relations-external#\" rdf:resource=\"info:fedora/collection:hardware\"></rel:isMemberOf>" . "\n");
        	fwrite($fh, "</rdf:Description>" . "\n");
        	fwrite($fh, "</rdf:RDF>" . "\n");
        fwrite($fh, "</foxml:xmlContent>" . "\n");
      	fwrite($fh, "</foxml:datastreamVersion>" . "\n");
      	fwrite($fh, "</foxml:datastream>" . "\n");

        // Write to close the root element
        fwrite($fh, "</foxml:digitalObject>");

        fclose($fh);

    } // while

    fclose($handle);

} // if
?>

Note: This script could use some cleaning up I know. I assume no one will really ever need it but once.

Advertisements