Handling Results in Fedora’s REST API

by wsampson

Lately I’ve been working to put in more development time with the Fedora repository at the Goodwill Computer Museum.

A PHP ingest interface we’ve set up is certainly the most developed of the our repository’s services, but there’s a strong need to relate one object to another as it is being ingested. To do this I want to provide the user with a drop down menu of objects in the repository which fulfill some criteria (say, the object represents a donator or creator). The user can select one during the ingest phase, relating the ingested object to this other object. That relationship would be recorded in the RELS-EXT datastream as RDF/XML, creating a triple. The predicate of that triple will come from either Fedora’s own ontology [RDF schema] or another appropriate namespace.

Below is PHP code using the cURL client library to call Fedora’s REST API and get this list of relevant objects. I encountered a few stumbling blocks putting this together, so I thought I’d share in case others were curious or looking at a similar problem.

The first step is to compose your query, and then initiate a cURL session with the query.


<?php
$request = "http://your.address.domain:port/fedora/objects?query=yourQuery&resultFormat=xml";
$session = curl_init($request);

curl_setopt($session, CURLOPT_RETURNTRANSFER, true);

$response = curl_exec($session);
$responseResult = simplexml_load_string($response);
$resultsArray = array();

foreach ($responseResult->{'resultList'} as $result) {
     foreach ($result->{'objectFields'} as $entry) {
          foreach ($entry as $value) {
               $resultsArray[] = $value;
          }
     }
}
curl_close($session);

while (!empty($token)) {
     $nextQuery = "http://your.address.domain:port/fedora/objects?sessionToken=" . urlencode($token) . "&query=yourQuery&resultFormat=xml";
     $nextSession = curl_init($nextQuery);

     curl_setopt($nextSession, CURLOPT_RETURNTRANSFER, true);

     $nextResponse = curl_exec($nextSession);
     $nextResponseResult = simplexml_load_string($nextResponse);

     foreach ($nextResponseResult->{'resultList'} as $result) {
          foreach ($result->{'objectFields'} as $entry) {
               foreach ($entry as $value) {
                    $resultsArray[] = $value;
               }
          }
     $token = $nextResponseResult->{'listSession'}->{'token'};
     print "$token<br />\n";

     curl_close($nextSession);

} //while
?>

On line 2 I’ve specified my query results to be returned as XML and not HTML (resultFormat=xml). This is because I don’t want a simple browser view of the results — I want to work with them some first, so XML is appropriate.

On line 5 the cURL option CURLOPT_RETURNTRANSFER to ‘true’. This directs cURL to deliver the return of its Fedora query as a string return value to the curl_exec() variable, in this case $response.

On line 8 $response, now an XML structure, is loaded into $responseResult as a PHP5 object. The object is a tree structure containing arrays for the result list, the entries, and the entries’ value arrays, all of which we can work through to get to the record values of interest. The specific contents will depend on your query. You can get a good look at the object with print_r():

print_r($responseResult);

The two Fedora REST commands used are findObjects and resumeFindObjects. We need both of these commands because findObjects will not return more than 100 results, regardless the value you set on maxResults.

Instead it returns the results along with a token. This token is a long-ish string you can then supply to resumeFindObjects, which will continue retrieving your results for you. Just like findObjects, resumeFindObjects will never return more than 100 results, instead giving you another unique token. Once again, you can supply that token to a new resumeFindObjects command to continue getting your results.

The two loops for each of these commands should fill resultsArray[] with all the results available in the repository.

You can use this array in a HTML drop down:


<?php
echo "<select name=\"donators\">";
foreach ($responseResult->{'resultList'} as $result) {
	foreach ($result->{'objectFields'} as $entry) {
		$pid    = (string) $entry->pid;
		$title  = (string) $entry->title;
		echo "<option value=\"$pid\">$title</option>";
	}
}
echo "</select>";
?>

Keep in mind that values like $entry->pid and $entry->title are only going to be in the results if those fields have been requested in your queries.

This approach has given me a good understanding of calling and manipulating objects in Fedora through PHP. I have found that setting maxResults to a smaller number (say 5, 10, or 20) is faster than setting it to its maximum 100. And of course, if you are going to be fetching hundreds or thousands of objects, it’s best not to dump them all in a drop down or to fetch them all at once.

Advertisements