Personal tools
You are here: Home Guides and Best Practices Best Practices-URIs

Best Practices-URIs

Recommendations and discussions of best practices for Uniform Resource Identifiers

This document summarizes some recommendations for best practices when using URIs, or Uniform Resource Identifers.

Each practice is marked with '[proposed]' until it is finalized and agreed by the team. Then the marker is removed.

Most of these analyses come from an in-progress "Ontology Provider's Guide" at MMI. The analyses may not be complete, or may be overcome by events in a relatively short period of time.

This document addresses three main topics: whether to use URIs or URNs; how to construct the URI of your choice for various purposes (e.g., representing a term or an ontology); and how to solve some typical needs.

About URIs: Uniform Resource Locators and Uniform Resource Names

By way of background, we briefly discuss the difference between URLs and URNs, the two main form of URIs. (We assume the reader appreciates the difference in appearance between the two; that is not presented here.) It is widely believed that "URLs are resolvable, but not so permanent", while "URNs are permanent, but not necessarily resolvable." We find both of these distinctions to be false: URLs do not have to be resolvable (although they usually are); and there is no inherent reason URNs are inevitably more persistent than URLs, other than possibly cultural effects associated with the difficulty of issuing them, and stability required in the associated management of URNs. (That is, the same stability could be created by other policies of the generating organization.)

The OGC has chosen URNs as its primary URI form for use in its specifications, has obtained a namespace for that purpose, and runs a URN resolver (one of the first) that can accept and resolve URNs. So it is a strong argument for the use of URNs. However, we identified several mitigating issues:

  • The OGC URN allocation process does not appear to scale. For example, MMI is dealing with close to 100 ontologies, with many more to come. The OGC URN creation process does not seem designed for accepting the contents of these ontologies, or the changes to them that will ensue. (The lack of actual URNs in the registry is perhaps suggestive on this point. Yet, the existence of the registry, and of comprehensive guidance on URNs, is extremely encouraging and helpful.)
  • The OGC URN scheme is not yet fully tested and accepted. (For example, a recent issue on versioning is in the process of being resolved.)
  • It isn't possible for an organization to easily make (semantically coherent) URNs on its own -- a long lead time is required to get the namespace. This is a strength of URNs because it effects stability, but is a limitation with regard to current needs.
  • In the OGC URN guidance, implicit reference is made to relatively few existing vocabulary sources, as if those sources are potentially or actually authoritative. Yet MMI experience suggests the opposite is true -- no source will be authoritative, at least for decades. Many 'points of light' will emerge and be useful, and we have to be able to interact with multiple ontological sources.

If few of the intended URIs will be resolvable (i.e, there is no service on the web that you can enter a URI representing the term, and get more information on the term), we would normally suggest the use of URNs. But at the time of this writing, no URN namespaces are defined that are explicitly for the purpose of defining large numbers of semantically transparent term URNs, while at the same time providing a good set of practices for creating them. We therefore do not yet recommend the regular use of URNs to represent terms on the semantic web.

Best Practice [proposed]: If URNs are mandated by applicable organizations; and if the URNs can be constructed to meet the needs of the user, then a URN is an appropriate choice for specifying terms and related resources. For example, terms that are already represented in the OGC URN resolver are appropriate to use as URIs.

Best Practice [proposed]: Use URLs for situations where it is desirable and intended to provide a "resolving service" to offer additional metadata about terms (resources), or where the conditions in the preceding Best Practice are not met.

Constructing URLs

One of the difficulties in using URIs consistently is establishing a practice of their construction that supports all their likely uses. After considerable iteration, we recommend the following forms.

These forms may also be used in URNs, with some modifications.

These forms may be compared with forms suggested for URNs by OGC (see a summary of OGC URN recommendations). See the section "For Ontology Terms" for more details of this comparison.

For Ontology Files

Best Practice [proposed]: Include the file extension, since this will help search engines to access the ontology files. For example typing “buoy filetype:owl” in Google will query ontologies with the term ‘buoy’. So, if the ontology is expressed in OWL use “.owl”, if it is in RDF, use “.rdf” (even though these are not approved MIME types). Example:
http://mmisw.org/ont/mmi/200807/platforms.owl

Best Practice [proposed]: The file name should encode information about the resource type (please see the Resource Types section below for more details) or other distinguishing characteristic, and can include the authority. The name should not contain spaces and if contains an authority, should contain at least one underscore “_” that separates the authority from the object type. The use of “-“ inside the name is not recommend since it will sometimes confuses search engines. (The character“-“ means exclusion, and even if the string is searched as a quoted string (“word1-word2”) you are not guaranteed to get pages with that exact string. For example, searching for “"moored-buoy" in Google, returns not only pages containing “moored-buoy”, but also pages containing “moored buoy” and “moored. Buoy”. This doesn’t occur if we use the underscore character “_”.) Example:
http://mmisw.org/ont/mmi/200807/platforms.owl

Best Practice [proposed]: Include a version identifier as a data for the version release, using the “YYYYMM” (year and month) pattern. Even though it is suggested to have the most significant elements in the leftmost side, it is recommended to place the “YYYYMM”, just before the ontology name. This is because the name of the ontology will be placed at the end of the path, which matches the file name of the ontology (and allows the ontology name to gracefully end in .owl, as recommended above). It also follows the W3C pattern. (If for any reason year and month are not sufficient to identify the file version, it is acceptable to extend the pattern through components of DD.hhmmss as needed.) Example:
http://mmisw.org/ont/mmi/200807/platforms.owl

Best Practice [proposed]: Include an authority to identify the organization that developed this version of the ontology. This helps recognize ontology owners in the URLs, helps group ontologies from the same authority, and helps distinguish similar ontologies (covering similar resources) from different authorities. Example:
http://mmisw.org/ont/mmi/200807/platforms.owl

Best Practice [proposed]: Include a root term representing the location or concept of ontologies ('ont' here), definitions ('def' in OGC URNs), or other category of similar information. This enables the ontology server to provide other services at other 'root paths', and helps identify the purpose of the URL to someone who is seeing it out of context. Example:
http://mmisw.org/ont/mmi/200807/platforms.owl

Comparison with OGC URNs

We have reversed the 'object type' (or 'resource type') and 'authority' fields, because we believe the 'object type' will have many different concepts, often varying according to local usage by the authority. So the authority field is a slightly more coherent organizational concept. It also is a much more useful concept for the maintainers of ontologies from multiple organizations.

Version precedes the object type field because the version is often (though not always) going to be assigned at the level of the ontology file, and in any case is needed to describe or tag a particular ontology file. Whether the version of an ontology file is inherited by every term in that release of the file is not the subject of a recommendation; either approach could be implemented with useful effect by a given repository or vocabulary.

The 'term' in these Best Practices corresponds to the OGC code in every way. We favor a semantic term because we feel it increases usability, corresponds directly to the actual existence of a unique concept (a string) in the "real world", and does not make any permanent representation as to the future representation of that or similar concepts. There are many who argue that putting semantics into the term, or indeed into the URL, creates the risk of tying meaning to the URL in ways that will eventually grow inconsistent. (For example, that the same word may in common usage come to mean something entirely different, as has happened many times in the past; hence the use of the URL with an English string representation of the term may someday imply something not originally desired.) While we acknowledge that analysis, we believe the evolution of terms supported by these URL practices support appropriate correspondence of URLs to meanings over time in almost all cases.

Resource Types

In the OGC and MMI URL recommendations, the concept of a Resource Type or Object Type appears. This helps characterize the group of terms that is in a given ontology; roughly, it could represent a class of object for which each term is an instance.

Examples of possible types of objects include:

ISO MD_Keywords:

  • Discipline
  • Place
  • Stratum
  • Temporal
  • Theme

OGC Object Types

  • axis
  • axisDirection
  • coordinateOperation
  • crs
  • cs
  • datum
  • dataType
  • derivedCRSType
  • documentType
  • ellipsoid
  • featureType
  • group
  • meaning
  • meridian
  • method
  • nil
  • parameter
  • phenomenon
  • pixelInCell
  • rangeMeaning
  • referenceSystem
  • uom
  • verticalDatumType

Other Concepts for Object Type

  • keyword
  • parameter
  • units
  • organization
  • platform
  • sensor
  • process
  • missingflag
  • qualityflag
  • coordinateReference
  • datum
  • protocol
  • metadataStandard
  • featureType and featureName (e.g. gazetteer)
  • speciesTypes and speciesNames (e.g. as used in OBIS)
  • discipline
  • place
  • theme
  • role of contact
  • general metadata attribute

For Controlled Vocabulary Terms

Each term in a Controlled Vocabulary should be represented by a URI. For the reasons specific earlier, we recommend these terms primarily be URLs. The following guidelines may be carried out for URNs and URLs equally, however.

Best Practice [proposed]: Form the URL of a term by deleting the file type suffix from the URL of the containing ontology or controlled vocabulary, and replacing it with '/' and the name of the term. (The selection of '/' is favored over '#' because the former may be resolved to a single term before a result is returned, while the latter will result by HTTP protocol in an entire ontology being downloaded.) Thus, the URL representing a term should follow the following scheme:
http://{hostdomain}/{ontologiesRoot}/{authority}/{version}/{resourceType}/{shortName}


Example:
http://mmisw.org/ont/mmi/200807/platforms/moored_buoy

For Most Recent Version of a FIle or Term

Best Practice [proposed]: Use a version-less URL to obtain the most recent ontology or term with the given ID. It is often the case that a user wants to obtain the most recent version of a given ontology or term. In this case, the version is omitted, or specified as '0' if the resourceType begins with a digit. Thus, these HTTP URL requests should each obtain the most recent version of the corresponding ontology or term, if the service can support that capability:

  • http://mmisw.org/ont/mmi/platforms/moored_buoy
  • http://mmisw.org/ont/mmi/platforms.owl

NOTE: This usage risks leading directly to the problem alluded to in the discussion of semantic URLs in the last section. For example, assume I have used and defined the term 'SST' in my vocabulary to be sea surface temperature, and so I expect 'http://mmisw.org/ont/myorg/seaterms/sst' to give me the most recent definition for sea surface temperature. But over time, 'sst' has come to mean "saline setpoint temperature", for reasons no one can foresee. I may be quite surprised someday if I ask for http://mmisw.org/ont/myorg/seaterms/sst and get the temperature of the setpoint of my saline solution. At the same time, this is not an unpredictable or inappropriate result; just as a URL asking for the most recent version of the nytimes.com site may be surprised someday when that site is taken over by a different organization, an unversioned URL should not be guaranteed to resolve in the same way over time. In contract, a versioned URL or URI can be guaranteed to be unchanged, at least for as long as the group that agrees to that policy supports the URL.

References

  • [1] http://marinemetadata.org/apguides/ontprovidersguide
  • [2] http://www.opengeospatial.org/ogcUrnPolicy#Governance (OGC URN governance discussions)
  • [3] http://urn.opengis.net/ (Resolver service)
  • [4] http://portal.opengeospatial.org/files/?artifact_id=24045 ("Definition identifier URNs in OGC namespace ", OGC Best Practices document)
  • [5] Various documents on the web, as summarized at http://mmi.mbari.org:8200/marinemetadata/arch/refs/urirefs/ (that's the old MMI space)
  • [6] Other references, as summarized at the end of Reference [1] above
Document Actions