Wf4Ever Research Object Bundle

Working Draft

This version:
http://purl.org/wf4ever/ro-bundle/2013-05-21/
Previous version:
http://purl.org/wf4ever/ro-bundle/2013-05-10/
Latest editor's draft:
https://w3id.org/bundle/draft/
Latest recommendation:
https://w3id.org/bundle/
Editor:
Stian Soiland-Reyes, University of Manchester

Wf4Ever Working Draft

Abstract

This specification defines a file format for storage and distribution of Research Objects as a ZIP archive; called a Research Object Bundle (RO Bundle). RO Bundles allow capturing a Research Object to a single file or byte-stream by including its manifest, annotations and some or all of its aggregated resources for the purposes of exporting, archiving, publishing and transferring research objects.

Status of This Document

This document is merely a public working draft of a potential specification. It has no official standing of any kind and does not represent the support or consensus of any standards organisation.

This document is a Working Draft published by the Wf4Ever project. This document is currently work in progress and should not be used as a basis for implementations. Questions, feedback and comments are kindly requested to be sent to the wf4ever-public mailing list/forum.

Table of Contents

1. Introduction

This section is non-normative.

The Wf4Ever Research Object model [RO] defines a model for aggregating the resources that contribute to a scientific work, including domain-specific annotations and provenance traces. The unit that collects these resources is called a Research Object (RO) and is described in an RDF-based manifest according to the Wf4Ever OWL ontologies. The RO model has been formed in particular for the purpose of preservation of scientific workflows, but is applicable also in a general sense for capturing resources that are related to eacher, and which together form a trackable whole. The Research Object primer [ROPrimer] provides further details and examples of using the RO model.

The specification for the RO model does not mandate any particular form for the representation of Research Objects. The Wf4Ever RO Storage and Retrieval Service API [ROSRS] defines how research objects can be accessed and maintained on the web through a RESTful web service exposing RDF/XML and Turtle representations. Practical use of the RO model has however shown that it is also benefitial to represent a research object as a single ZIP archive or as file system folders for the purposes of downloading, editing and archiving a research object.

For instance a scientific workflow system can export a workflow run by saving the workflow definition, runtime provenance trace and generated results to a set of files. A research object that represents the workflow run can aggregate and relate these resources. However, at the time of running the workflow (e.g. on a desktop computer) it is often not known where or if the user would choose to publish the RO; thus the direct use of a ROSRS service or minting public URIs is problematic in this situation.

A Research Object Bundle, as specified by this document, provides a way to collect the resources that are aggregated in a research object, represented as files in a ZIP archive, in addition to their metadata and annotations. The ZIP archive thus becomes a single representation of a research object and which can be exported, archived, published and transferred like a regular file or resource.

2. Container

A Research Object Bundle is a structured [ZIP] archive, specializing the Adobe Universal Container Format [UCF]. UCF is based on the EPUB [OCF] format, but generalized to be any kind of container. The following section gives an informal introduction to the UCF format. For the complete, normative details, see the [UCF] specification.

2.1 Universal Container Format (UCF)

This section is non-normative.

A UCF container is based on the ZIP compression file format [ZIP], enforcing additional restrictions. The most important restrictions are:

UCF says about mimetype:

The first file in the Zip container MUST be a file with the ASCII name of mimetype, which holds the MIME type for the Zip container (application/epub+zip as an ASCII string; no padding, white-space, or case change).

The actual media type to include in mimetype depends on the specific container type (the above quote uses ePub as an example). See section 2.2 RO bundle container.

Best Practice 1: Use zip -0 -X

To add the mimetype file correctly on a UNIX/Linux installation with InfoZip, use echo -n and zip -0 -X. Below is an example which adds mimetype correctly as the first, uncompressed file, then the remaining files (excluding mimetype) with the default compression:

Example 1
stain@ahtissuntu:~/test$ echo -n application/vnd.wf4ever.robundle+zip > mimetype 

stain@ahtissuntu:~/test$ zip -0 -X ../example.robundle mimetype
  adding: mimetype (stored 0%)

stain@ahtissuntu:~/test$ zip -X -r ../example.robundle . -x mimetype
  adding: META-INF/ (stored 0%)
  adding: META-INF/container.xml (stored 0%)
  adding: .ro/ (stored 0%)
  adding: .ro/manifest.json (stored 0%)
  adding: helloworld.txt (stored 0%)

2.1.1 Rootfile

This section is non-normative.

A root file is the entry-point for a UCF container, playing a similar role to index.html on web servers.

UCF says about META-INF/container.xml and rootfiles:

A UCF Container MAY include a file named container.xml in the META-INF directory at the root level of the container file system. If present, the container.xml file MAY identify the MIME type of, and path to, the root file for the container and any OPTIONAL alternative renditions included in the container.

An example of META-INF/container.xml which defines the rootfile as .ro/manifest.json:

Example 2
<?xml version="1.0"?>
<container version="1.0"
    xmlns="urn:oasis:names:tc:opendocument:xmlns:container">
    <rootfiles>
        <rootfile full-path=".ro/manifest.json" media-type="application/ld+json" />
    </rootfiles>
</container>

2.2 RO bundle container

The RO Bundle container is a specialization of a [UCF] container, with the following additions:

Applications who specialize RO Bundles MAY specify a different mimetype, for instance because the bundle is used to distribute application-specific data. It is RECOMMENDED for such extensions that their media type end with +zip according to [RFC6839] unless it is not considered meaningful for a user to treat such bundles as a general ZIP archive.

2.2.1 Resource media type

If an application requires a media-type for a resource, for instance because it is exposing the RO bundle over HTTP, it SHOULD resolve the media type of the resource according to this section.

In order of preference:

  1. A resource which is a root file is assumed to have the media type given by the mimetype of the corresponding (or implied) <rootfile> entry.
  2. If a resource is an external reference (e.g. referenced with an absolute http:// URI), then its media type is given by the HTTP Content-Type, which may involve content negotiation.
  3. If the resource is a aggregated in the manifest, applications SHOULD use the mediatype (dc:format in RDF manifests), if present.
  4. Failing the above, the media type of a resource MAY be resolved according to the following table by case-insensitive matching of its extension (suffix):
    Extension Media type
    .txt text/plain; charset="utf-8"
    .ttl text/turtle; charset="utf-8"
    .rdf application/rdf+xml
    .json application/json
    .jsonld application/ld+json
    .xml application/xml
  5. In the absence of a resolved media type, the media type application/octet-stream MAY be assumed.

2.2.2 META-INF/manifest.xml

To avoid confusion with the somewhat overlapping RO manifest it is NOT RECOMMENDED to include the ODF manifest (META-INF/manifest.xml) in RO Bundles or to use the ODF manifest for resolving media types.

3. Manifest

The research object SHOULD be described in the file .ro/manifest.json as specified below. Alternative manifests MAY also be present.

3.1 .ro/manifest.json

The file .ro/manifest.json, if present, MUST contain the [ORE] manifest for the research object according to this section. The file MUST be in JSON format [RFC4627], and SHOULD be valid [JSON-LD].

Identifiers used below are either:

  1. Meta-resources, path relative to .ro/ directory, which MUST NOT contain the : character. For instance manifest.json or annotations/ann2. Depending on how meta-resources are used, the ZIP might or might not include a corresponding entry for the given path.
  2. Bundled resources The path SHOULD starts with / to indicate the root of the bundle, for instance /hello.txt or /folder2/. Folders SHOULD have a path terminating with /. The resource identified by the path SHOULD be included as a corresponding file or directory in the ZIP file.
  3. Absolute URIs (contains :), external to the bundle. For instance http://example.com/external

The structure of the JSON manifest is given by an JSON Object with the keys:

@context
JSON-LD context. SHOULD be present, in which case it MUST be valid according to the JSON-LD @context keyword. The RO bundle context SHOULD be a list, and SHOULD include the value "https://w3id.org/bundle/context". This value SHOULD be the last item of the list.
id
RO identifier. SHOULD be present, in which case it SHOULD have the fixed value / indicating the relative top-level folder as the identifier. (See section 4. Identifiers.)
manifest
ORE manifests describing this RO, relative to the .ro/ directory. SHOULD be literal "manifest.json", but MAY be a list, in which case the list MUST contain "manifest.json"
createdOn
The time the RO was serialized as this RO bundle. SHOULD be present, in which case it MUST be a xsd:dateTime formatted timestamp (ISO 8601), and SHOULD include the time zone.
createdBy
The creator of the RO bundle. This MAY be different from the person forming the research object, which SHOULD be indicated with authoredBy. The creator SHOULD be an object with the following keys:
uri
A URI identifying the agent. The URI SHOULD be present, and SHOULD be a valid WebID, for instance http://example.com/fred#fred
orcid
An ORCID identifier for this agent. For instance, http://orcid.org/0000-0001-9842-9718. An ORCID MAY be present if known, and MUST be a URI.
name
The full name of the agent. The name SHOULD be present. Examples: "John Doe" or "University of Manchester"
Additional foaf: properties (such as foaf:homepage) MAY be added to the top-level @graph according to section 3.1.2 Custom JSON-LD by using a @id equal to the creator uri.
authoredOn
The time the Research Object was conceptually formed. The author time SHOULD be present if different from createdOn. The value MUST be a xsd:dateTime formatted timestamp (ISO 8601), and SHOULD include the time zone
authoredBy
The author of the Research Object, the agent(s) that conceptually formed the RO. The author SHOULD be present if different from createdBy. SHOULD be an object with the same keys and requirements as for createdBy, but MAY be a list to indicate multiple authors.

Additional authorship information (curation, contribution, etc) MAY be added using the pav: namespace within the top-level @graph key according to section 3.1.2 Custom JSON-LD by using an @id value equal to the bundle id, e.g. "/".

history
Provenance trace of the life of this RO, relative to the .ro/ directory. This property MAY be present, in which case it SHOULD be "evolution.ttl", indicating that the file .ro/evolution.ttl contains the provenance trace. This value MAY be a URI. The property MAY give a list if several provenance traces are known, in which case the list SHOULD include "evolution.ttl".

The file .ro/evolution.ttl, if present, SHOULD include a provenance trace of this research object according to the roevo ontology.

aggregates
This property SHOULD be present, in which case it MUST be a list of the resources aggregated by this RO. The values in a list MUST be either:
  • A path relative to the root of the bundle, prefixed with /
  • An absolute URI
  • An object, which SHOULD be uniquely identified by either file or uri. Its members are:
    file
    A path within the bundle. The path SHOULD be prefixed with /
    uri
    An absolute URI. The key uri MUST NOT be provided at the same time as file.
    mediatype
    The IANA media type of the (typically identified by file) resource. This SHOULD be specified for a resource identified by file, unless its media type is correctly identified according to section 2.2.1 Resource media type.
    createdOn
    createdBy
    File creation date and creator, as specified above for the research object.
    bundledAs
    Details of how this resource has been bundled. This object SHOULD be present for resources aggregated with uri, and MAY be present for other resources. Its members are:
    proxy
    The identifier for an ORE proxy [ORE] for the resources as aggregated by this RO. This property is intended the purposes of referring to "resource X as aggregated in research object Y" within annotations and in external documents. This property SHOULD be given for external resources aggregated by uri references, as they could be aggregated in multiple ROs, and MAY be given for other resources.

    The proxy identifier SHOULD consist of the prefix urn:uuid: and a lowercased UUID string [RFC4122]. For example: urn:uuid:d4f09040-272e-467f-9250-59593bd4ac8f

    folder
    A folder this resource (typically identified by uri) belongs to, relative to the root of the bundle. The path SHOULD be prefixed with / and SHOULD end with /, for instance /folder2/. The folder SHOULD be a directory in the zip archive.
    filename
    The filename the resource (typically identified by uri) is given within the given folder. The key filename SHOULD be present if folder is given. If filename is given, the folder MUST be present. The filename should not contain the characters /, : or \, but MAY contain spaces and international characters.
    aggregatedBy
    OPTIONAL property, the agent that aggregated this resource to this research object, specially when different from the createdBy of the research object. SHOULD be an object with the same keys and requirements as for createdBy above, but MAY be a list to indicate multiple aggregators.
    aggregatedOn
    OPTIONAL property, when this resource was aggregated, specially when different from the createdBy for the research object. The value, if present, MUST be xsd:dateTime formatted timestamp (ISO 8601), and SHOULD include the time zone.
Additional metadata about a resource, if present, SHOULD be added as an annotation (see below).

The order of the values in the aggregates list is insignificant, however the list MUST NOT contain duplicate entries. An entry is considered duplicate by comparing literal values and members file and uri uniformly as URIs [URI].

annotations
Annotations, MAY be present, in which case it MUST be a list. An annotation provides additional metadata or descriptions which are somewhat about or related to the research object or some of its aggregated resources.

An annotation is specified as an object, which have the following members:

annotation
The identifier for this annotation. The identifier SHOULD be present, and SHOULD consist of the prefix urn:uuid: and a lowercased UUID string [RFC4122]. For example: urn:uuid:1a876f9e-4ffe-4c99-a05d-cd9d0cbd4cbb
about
The identifier for the annotated resource, MUST be present. This is considered the target of the annotation, that is the resource the annotation content is "somewhat about". The "about" identifier SHOULD be one of these types:
  • The research object itself, which SHOULD match the value of its id, e.g. "/"
  • A bundled resource, starting with /, which SHOULD be listed under aggregates if that key is present
  • A proxy for an aggregated resource, starting with urn:uuid:, which MUST be defined under aggregates with a matching value for proxy
  • Another annotation, starting with urn:uuid:, which MUST be defined under annotations
  • An absolute URI, which may or may not be aggregated by the RO
  • A JSON list, containing any of the above. This indicates that the annotation is about each of the listed resources, for instance because the annotation content is describing their relationship.
content
The identifier for a resource that contains the body of the annotation, SHOULD be present. The content identifier SHOULD be one of these types:
  • A (non-aggregated) meta-resource (typically an RDF graph), starting with annotations/, which SHOULD exist in the .ro/annotations/ directory
  • An aggregated bundled resource, starting with /, which SHOULD be listed under aggregates and MUST be included in the ZIP archive
  • An absolute URI, which may or may not be aggregated by the RO
Additional properties describing the annotation using the oa: namespace MAY be added to the top-level @graph according to section 3.1.2 Custom JSON-LD by using a @id matching the annotation identifier.
@graph
A list of additional [JSON-LD] statements according to section 3.1.2 Custom JSON-LD.

An example of a manifest which is valid JSON-LD is included below:

Example 3
{
    "@context":  [
        "https://w3id.org/bundle/context"
    ],
    "id": "/",
    "manifest":  "manifest.json",
    "createdOn": "2013-03-05T17:29:03Z",
    "createdBy": {
        "uri":     "http://example.com/foaf#alice",
        "orcid":   "http://orcid.org/0000-0002-1825-0097",
        "name":    "Alice W. Land" },
    "history":   "evolution.ttl",
    "aggregates": [
       "/folder/soup.jpeg",
       "http://example.com/blog/",
       { "file":      "/README.txt",
         "mediatype": "text/plain",
         "createdBy": {
             "uri":     "http://example.com/foaf#bob",
             "name":    "Bob Builder" },
         "createdOn": "2013-02-12T19:37:32.939Z" },
       { "uri":    "http://example.com/comments.txt",
         "bundledAs": { 
            "proxy":    "urn:uuid:a0cf8616-bee4-4a71-b21e-c60e6499a644",
            "folder":   "/folder/",
            "filename": "external.txt" }
       }
    ],
    "annotations": [
      { "annotation": "urn:uuid:d67466b4-3aeb-4855-8203-90febe71abdf",
        "about":      "/folder/soup.jpeg",
        "content":    "annotations/soup-properties.ttl" },
  
      { "about":   "urn:uuid:a0cf8616-bee4-4a71-b21e-c60e6499a644",
        "content": "http://example.com/blog/they-aggregated-our-file" },
  
      { "about":   [ "/", "urn:uuid:d67466b4-3aeb-4855-8203-90febe71abdf" ],
        "content": "annotations/a-meta-annotation-in-this-ro.txt" }
    ]
}

3.1.1 JSON-LD and mapping to RO model

Manifests following the JSON structure defined in section 3.1 .ro/manifest.json with a "@context": [ "https://w3id.org/bundle/context" ] is intended to be valid [JSON-LD] without any additional modifications. Mapping .ro/manifest.json to the ORE and [RO] models in RDF SHOULD be performed according to the algorithm for conversion from JSON to RDF, as specified in the JSON-LD API [JSON-LD].

In order to generate the RDF implied by the manifest, a base URI SHOULD be assumed according to section 4.1 Absolute URIs for bundle resources with a path of /.ro/ -- e.g. relative to .ro/manifest.json, in order to ensure that paths starting with / don't "climb out" of the bundle root.

Example 4
{
  "@context": {
    "ao": "http://purl.org/ao/",
    "oa": "http://www.w3.org/ns/oa#",
    "dc": "http://purl.org/dc/elements/1.1/",
    "dct": "http://purl.org/dc/terms/",
    "ore": "http://www.openarchives.org/ore/terms/",
    "ro": "http://purl.org/wf4ever/ro#",
    "roterms": "http://purl.org/wf4ever/roterms#",
    "bundle": "http://purl.org/wf4ever/bundle#",
    "prov": "http://www.w3.org/ns/prov#",
    "pav": "http://purl.org/pav/",
    "xsd": "http://www.w3.org/2001/XMLSchema#",
    "foaf": "http://xmlns.com/foaf/0.1/",

    "id": "@id",
    "file": "@id",
    "uri": "@id",
    "annotation": "@id",

    "manifest": {
        "@id": "ore:isDescribedBy",
        "@type": "@id"
    },

    "createdOn": {
        "@id": "pav:createdOn",
        "@type": "xsd:dateTime"
    },
    "createdBy": {
        "@id": "pav:createdBy",
        "@type": "@id"
    },
    "authoredOn": {
        "@id": "pav:authoredOn",
        "@type": "xsd:dateTime"
    },
    "authoredBy": {
        "@id": "pav:authoredBy",
        "@type": "@id"
    },
    "curatedOn": {
        "@id": "pav:curatedOn",
        "@type": "xsd:dateTime"
    },
    "curatedBy": {
        "@id": "pav:curatedBy",
        "@type": "@id"
    },
    "contributedOn": {
        "@id": "pav:contributedOn",
        "@type": "xsd:dateTime"
    },
    "contributedBy": {
        "@id": "pav:contributedBy",
        "@type": "@id"
    },
    "name": {
        "@id": "foaf:name"
    },
    "orcid": {
        "@id": "roterms:orcid",
        "@type": "@id"
    },

    "history": {
        "@id": "prov:has_provenance",
        "@type": "@id"
    },
    "aggregates": {
      "@id": "ore:aggregates",
      "@type": "@id"
    },
    "mediatype": {
        "@id": "dc:format"
    },
    "folder": {
      "@id": "ore:proxyIn",
      "@type": "@id"
    },
    "filename": { 
        "@id": "ro:entryName"
    },
    "proxy": {
      "@id": "bundle:hasProxy",
      "@type": "@id"
    },

    "annotations": {
      "@id": "bundle:hasAnnotation",
      "@type": "@id"
    },
    "content": {
       "@id": "oa:hasBody",
       "@type": "@id"
    },
    "about": {
       "@id": "oa:hasTarget",
       "@type": "@id" 
    }

  }
}

As an example of this processing, below is a Turtle representation after processing the .ro/manifest.json shown as an example in section 3.1 .ro/manifest.json assuming a base URI of app://2b9486f0-54d8-4274-b241-7669538b0d2f/.ro/

Example 5
@base <app://2b9486f0-54d8-4274-b241-7669538b0d2f/.ro/> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix prov: <http://www.w3.org/ns/prov#> .
@prefix pav: <http://purl.org/pav/> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix dc: <http://purl.org/dc/elements/1.1/> .
@prefix oa: <http://www.w3.org/ns/oa#> .
@prefix ore: <http://www.openarchives.org/ore/terms/> .
@prefix ro: <http://purl.org/wf4ever/ro#> .
@prefix bundle: <http://purl.org/wf4ever/bundle#> .

</> pav:createdBy <http://example.com/foaf#alice> ;
    pav:createdOn "2013-03-05T17:29:03Z"^^xsd:dateTime ;
    ore:aggregates 
        </README.txt>,
        </folder/soup.jpeg>,
        <http://example.com/blog/>,
        <http://example.com/comments.txt> ;
    ore:isDescribedBy <manifest.json> ;
    prov:has_provenance </evolution.ttl> ;
    bundle:hasAnnotation
        <urn:uuid:d67466b4-3aeb-4855-8203-90febe71abdf>,
        [ oa:hasBody </annotations/a-meta-annotation-in-this-ro.txt> ;
          oa:hasTarget </>, <urn:uuid:d67466b4-3aeb-4855-8203-90febe71abdf> ],
        [ oa:hasBody <http://example.com/blog/they-aggregated-our-file> ;
          oa:hasTarget <urn:uuid:a0cf8616-bee4-4a71-b21e-c60e6499a644> ] ;

</README.txt>
    dc:format "text/plain";
    pav:createdBy <http://example.com/foaf#bob> ;
    pav:createdOn "2013-02-12T19:37:32.939Z"^^xsd:dateTime .

<http://example.com/foaf#alice>
    roterms:orcid <http://orcid.org/0000-0002-1825-0097> ;
    foaf:name "Alice W. Land" .

<http://example.com/foaf#bob> foaf:name "Bob Builder" .

<urn:uuid:d67466b4-3aeb-4855-8203-90febe71abdf> oa:hasBody </annotations/soup-properties.ttl> ;
    oa:hasTarget </folder/soup.jpeg> .

3.1.2 Custom JSON-LD

Applications who support JSON-LD (rather than just JSON) MAY choose to parse and generate statements in .ro/manifest.json according to the [JSON-LD] specifications.

Applications generating JSON-LD MAY provide additional items in the @context list, but SHOULD include https://w3id.org/bundle/context as the last item, to indicate to JSON parsers that the manifest can be parsed as plain JSON according to section 3.1 .ro/manifest.json. Applications SHOULD NOT use @context at deeper nexting levels.

Applications SHOULD NOT write additional properties directly to JSON-LD nodes defined from section 3.1 .ro/manifest.json. Instead, additional statements SHOULD be made within an additional @graph node according to JSON-LD Named Graphs. @graph SHOULD only be added to the top-level object. For example:

{
    "@context":  [
        "https://w3id.org/bundle/context"
    ],
    "id": "/
    "manifest": "manifest.json",
    "aggregates": [
       "http://example.com/blog/2012",
       "http://example.com/blog/2013"
    ],
    "@graph": [
      { "@id": "http://example.com/blog/2013",
        "dcterms:replaces": "http://example.com/blog/2012" },
      { "@id": "http://example.com/blog/2012",
        "dcterms:isReplacedBy": "http://example.com/blog/2013" }
    ]
}
    

Note that rather than using the above extension mechanism, it is generally RECOMMENDED to instead store such additional statements in an annotation body for purposes of provenance and separation of concern. Although technically valid, it is NOT RECOMMENDED to use the member @graph to embed semantic annotation bodies within annotations nodes, as it would duplicate the content of the annotation body in the bundle and may lead to inconsistencies.

3.2 Alternative manifest representations

In addition to the .ro/manifest.json representation specified in section 3.1 .ro/manifest.json, a Research Object Bundle MAY include the ORE manifest in alternative representations like RDF/XML [RDF-SYNTAX-GRAMMAR] and Turtle [TURTLE], for instance by generating them using the conversion from JSON to RDF algorithm in JSON-LD API [JSON-LD].

If an application is modifying a research object bundle which contains manifests it can't handle (and thus can't update), the application SHOULD remove the rootfile entry for those unsupported manifests, and MAY delete those manifests from the archive.

4. Identifiers

This section is non-normative.

Objects in a research object bundle are identified within the JSON manifest relative to .ro/manifest.json, with / resolving to the root of the ZIP archive.

Prefix Interpretation
/ Path relative to root of ZIP archive
urn:uuid: UUID according to [RFC4122]
(containing :) Absolute URI
(no prefix) Path relative to .ro/

Due to their nature as ZIP files, Research Object Bundles might be downloaded, copied, moved and republished. In order to avoid ambiguity about RO identity and evolution, each Research Object Bundle serialization is considered to represent unique Research Objects. Thus any of the prefixes above describing resources within the bundle are relative to the root of the ZIP file, and the id identifying the Research Object is set to "/", meaning the root represents the RO itself.

4.1 Absolute URIs for bundle resources

This section is non-normative.

Applications which require an absolute URI for identifying a resource within a Research Object Bundle may choose to use the approach presented in this section in combination with resolving against the prefix table above.

The app: URI scheme [APP-URI] proposes how a URI can be formed for the purposes of accessing resources within a ZIP file as if the resources were retrieved from a HTTP server. While this is primary intended for sandboxing HTML applications, it is equally applicable to Research Object bundles for the purposes of sandboxing and for generating a URI independent of the location of the ZIP archive.

The app: URI scheme recommends generating a UUID string [RFC4122] for minting the authority, forming the base URI for the RO bundle. For instance, if:

http://example.com/example1.robundle
contains the file /folder/helloworld.txt, then we generate a new UUID 8191dee8-0b8e-452d-8d64-7706a140185e and refer to the Research Object as
app://8191dee8-0b8e-452d-8d64-7706a140185e/
and can refer to its bundled file /folder/helloworld.txt as:
app://8191dee8-0b8e-452d-8d64-7706a140185e/folder/helloworld.txt

The type of authority to generate depends on what is the purpose of the absolute URI:

  1. For security/sandboxing when interpreting RO bundles, the authority should be a v4 UUID from random numbers. Thus the URI is guaranteed to be unique for each interpretation, and can't (reasonably) be pre-guessed.
    Applications exposing such URIs might want to record the equivalent provenance of a pav:retrievedFrom relation to indicate where the bundle was retrieved from. For instance (in Turtle):
    Example 6
    @prefix pav: <http://purl.org/pav/> .
    @prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
    <app://15259726-dcbb-42ff-8fc6-36282c98d4e6/>
        pav:retrievedFrom <http://example.com/example1.robundle> ;
        pav:retrievedOn "2013-05-21T14:24:19Z"^^xsd:dateTime .
  2. For describing/referencing content of an RO bundle accessed at a given URL, the authority should be generated as a name based UUID using v5 (SHA-1 hashing) concatination of the URL namespace 6ba7b811-9dad-11d1-80b4-00c04fd430c8 (as UUID bytes) and the ASCII-escaped version of the URL. This approach gives a predictable UUID for a particular URL, even if the content at the URL might later change.
    Applications using this approach might want to declare the equivalent of a owl:sameAs relation between the accessed URI and the generated app: URI in order to record the original URI. For instance:
    Example 7
    @prefix owl:  .
    <app://7878e885-327c-5ad4-9868-7338f1f13b3b/> owl:sameAs  
            <http://example.com/example1.robundle> .
                        
  3. For purposes of describing the content of an RO bundle as a bytestream independent of its location (for instance on a USB stick), then the authority should be the hexadecimal SHA-256 checksum of the ZIP archive (not a UUID). This ensures a predictable UUID for the same physical representation, where any change to the bundle generates a new identifier.
    Applications exposing such URIs might want to record the equivalent provenance of a pav:retrievedFrom relation to indicate where the bundle was retrieved from, including the time of retrieval using the equivalent of pav:retrievedOn. (See example above).

Example app base URIs:

5. Conformance

As well as sections marked as non-normative, all authoring guidelines, diagrams, examples, and notes in this specification are non-normative. Everything else in this specification is normative.

The key words MUST, MUST NOT, REQUIRED, SHOULD, SHOULD NOT, RECOMMENDED, MAY, and OPTIONAL in this specification are to be interpreted as described in [RFC2119].

A. Acknowledgements

Thanks to Khalid Belhajjame, Graham Klyne and Piotr Hołubowicz for reviewing this specification. The underlying work has been funded as part of the Wf4Ever project, funded by the European Commisson's FP7 programme (FP7-ICT-2007-6 270192). Many thanks to Robin Berjon for making ReSpec.js which generated this page.

B. References

B.1 Normative references

[JSON-LD]
Manu Sporny; Gregg Kellogg; Markus Lanthaler. JSON-LD 1.0. 11 April 2013. W3C Working Draft. URL: http://www.w3.org/TR/json-ld/
[ORE]
Open Archives Initiative Object Reuse and Exchange. ORE Specifications and User Guides. URL: http://www.openarchives.org/ore/1.0/toc.html
[RFC2119]
S. Bradner. Key words for use in RFCs to Indicate Requirement Levels. March 1997. Internet RFC 2119. URL: http://www.ietf.org/rfc/rfc2119.txt
[RFC4627]
D. Crockford. The application/json Media Type for JavaScript Object Notation (JSON) (RFC 4627). July 2006. RFC. URL: http://www.ietf.org/rfc/rfc4627.txt
[RO]
Stian Soiland-Reyes; Sean Bechhofer et al.: Wf4Ever Research Object Model. 30 November 2012. URL: http://purl.org/wf4ever/model
[UCF]
Adobe: Universal Container Format. URL: https://learn.adobe.com/wiki/display/PDFNAV/Universal+Container+Format accessed 2013-02-28.

B.2 Informative references

[APP-URI]
Marcos Cáceres: The app: URI scheme. W3C First Public Working Draft 16 May 2013. URL: http://www.w3.org/TR/2013/WD-app-uri-20130516/
[OCF]
James Pritchett; Markus Gylling: EPUB Open Container Format (OCF) 3.0. Recommended Specification 11 October 2011. International Digital Publishing Forum (IDPF). URL: http://idpf.org/epub/30/spec/epub30-ocf-20111011.html
[RDF-SYNTAX-GRAMMAR]
Dave Beckett. RDF/XML Syntax Specification (Revised). 10 February 2004. W3C Recommendation. URL: http://www.w3.org/TR/2004/REC-rdf-syntax-grammar-20040210
[RFC4122]
P. Leach; M. Mealling; R. Salz. A Universally Unique IDentifier (UUID) URN Namespace (RFC 4122). July 2005. RFC. URL: http://www.ietf.org/rfc/rfc4122.txt
[RFC6839]
T. Hansen; A. Melnikov: Additional Media Type Structured Syntax Suffixes January 2013. RFC 6839. URL: http://tools.ietf.org/html/rfc6839
[ROPrimer]
Jun Zhao et al: Wf4Ever Research Object Ontologies and Vocabularies Primer. 13 April 2012. URL: http://purl.org/wf4ever/primer accessed 2013-02-27
[ROSRS]
Piotr Holubowicz; Graham Klyne et al.: RO SRS interface 6. 21 December 2012. URL: http://www.wf4ever-project.org/wiki/display/docs/RO+SRS+interface+6
[TURTLE]
David Beckett; Tim Berners-Lee. Turtle: Terse RDF Triple Language. January 2008. W3C Team Submission. URL: http://www.w3.org/TeamSubmission/turtle/
[URI]
T. Berners-Lee; R. Fielding; L. Masinter. Uniform Resource Identifiers (URI): generic syntax. January 2005. RFC 3986. URL: http://www.ietf.org/rfc/rfc3986.txt
[ZIP]
PKWare Inc: .ZIP File Format Specification version 6.3.3 (2012-09-01). URL: http://www.pkware.com/documents/casestudies/APPNOTE.TXT