Vocabulary Registry data

As noted on the page Getting started with the Vocabulary Registry API, the Vocabulary Registry works with structured data, sent back and forth between the Registry and its clients.

This page explains the details you need to know in order to work with data sent between the Vocabulary system Registry and its clients.

Structured data is encoded in one of two supported transfer formats: XML and JSON. In the first section below, we look at general principles that apply irrespective of the transfer format used. Subsequent sections look at principles that apply to particular transfer formats.

The Swagger UI is a very convenient tool to work with and to see examples of data as it is actually specified in either of the supported transfer formats. See Accessing the Vocabulary Registry API for details.

General principles

Parameters, request and response headers and bodies, and status codes

The following dot points contain information that's important to bear in mind if you will be accessing API methods directly using the advertised method URLs, rather than with an API client library.

Some API methods have one or more parameters that are provided as URL query parameters. Such parameters always have a simple (scalar) type, such as integer, boolean, or string.
Some API methods have a parameter that is specified as the HTTP request body. Such parameters are typically provided as structured data (i.e., as either XML data or a JSON object). If so, you will need to provide an appropriate setting for the Content-Type request header.
Depending on the API method, one or more parameters may be required; one or more parameters may be optional. If a method parameter is optional, it is always expected as a URL query parameter, never as part of the HTTP request body.
Some API methods return a result as the HTTP response body. Such responses are typically provided as structured data (i.e., as either XML data or a JSON object). You will need to provide an appropriate setting for the Accept request header.
API methods always return an HTTP status code that indicates a success or failure outcome. The status codes that are used vary according to the API method, and are consistent with the traditional semantics of HTTP status codes. For example, don't assume that success always means the status code will be 200; some API methods return a success status code of 201 or 204.
As the HTTP status code returned from an API method may vary with the outcome of the method, so also the type of structured data contained in the response body may vary. In particular, a response that has an error HTTP status code may include a response body that is structured data of a corresponding error type. You can confidently use the HTTP status code to predict the type of structured data contained in the response body.
Some API methods return additional HTTP response headers.

If you are using an API client library, you typically don't need to worry about the technical distinction between a parameter specified as a query parameter and a parameter specified as the request body: the library methods handle that for you. (You may have to take into account a distinction between required and optional parameters; that will depend on the programming language you are using.)

Here are some examples of parameters and responses:

The API method getVocabularyById consumes a (query) parameter that is an integer value, and produces a result that has the structure of a vocabulary. If there is no vocabulary corresponding to the parameter value, an error HTTP status code (400) will be returned.
The API method updateVocabulary consumes a (query) parameter that is an integer value and a (request body) parameter that is a vocabulary, and produces an empty result. If the vocabulary data has validation errors, an error HTTP status code (400) will be returned, and there will be a structured data response that includes a list of the validation errors.

ARDC recommends that you explore the API using the Swagger UI, to see details of parameters and responses for each method. More specific information about some methods is provided at the page Vocabulary Registry API methods.

Resources, and resource IDs

The concepts of resource and resource ID are fundamental to the Registry, and this is reflected in the URLs of API methods and in the structured data generated by and accepted by API methods.

Resource IDs are very important

The presence and absence of resource IDs is of particular importance when sending vocabulary metadata to the Registry API. If you will be using the Registry API to create or update vocabularies, you need to have a correct understanding of when resource IDs must and must not be provided.

These are the key points to be observed:

Many Registry model entities are presented as resources, and resources have types corresponding to the underlying entity classes. This is reflected in the URL structure of the various API methods. For example, the API methods that get metadata for vocabulary instances include the URL path component vocabularies.
Many Registry model entities have a resource ID. The section "Resource IDs" at the page Vocabulary Registry model entities lists the entity classes that have resource IDs and what they look like.
Resource IDs are also included in the URL structure of the various API methods. For example, to get the metadata for the vocabulary instance that has resource ID 3, the API method URLs include the component vocabularies/3.
All resource IDs are assigned by the Registry. The Registry does not accept requests to create resources with resource IDs of your choice; therefore, when creating a resource, you must not specify a resource ID for the resource.
Conversely, when you use an API method to ask the Registry to create resources, it will always tell you the resource ID of every such resource it created, as part of the response to the API method.
All "getter" API methods always provide all resource IDs for the resources contained in the responses.
When making an update to a top-level resource (i.e., a vocabulary or a related entity), you must provide in the request body the full details, including the resource IDs, for all of the existing sub-resources (e.g., for a vocabulary, its versions and user-specified access points) you want the Registry to keep (whether or not you are changing any of the details of those resources). If you omit mention of an existing sub-resource, that is understood by the Registry as a request to delete that sub-resource.

More precise details for individual API methods are provided at the page Vocabulary Registry API methods.

Slugs

There is a "slug" associated with each vocabulary, and with each version of each vocabulary. (See, e.g., Wikipedia: Clean URL (Slug) for a description of slugs.) Slugs are used within the URLs of some access points. For example, within the URL http://vocabs.ardc.edu.au/repository/api/lda/aodn/aodn-instrument-vocabulary/version-2-1/concept there is an owner slug aodn, a vocabulary slug aodn-instrument-vocabulary, and a version slug version-2-1. By default, slugs are generated from the underlying owner and title data. For example, the vocabulary slug is generated from the vocabulary title. See also the references to slugs on the pages SPARQL endpoint and Linked Data API.

When adding a vocabulary or version, you don't need to specify slug values; the Registry creates them for you. But you can use the API method generateSlug to see what slug will be generated from any particular string. If you want a different slug from the one automatically generated from the title, you can specify it when you create the vocabulary/version; this "overrides" the default title-based generation process. (Note: this is a "bonus" feature of the API that is not currently made available through the Portal user interface.) If you specify your own slug during vocabulary/version creation, it must have the correct format for a slug. See below for a way to check this.

For computer scientists: slug generation is an idempotent operation (see, e.g., Wikipedia: Idempotence (Computer science meaning)). That means:

Let S be any string. Then generateSlug(generateSlug(S)) = generateSlug(S).
You can see if a particular string is valid as a slug by running it through the generateSlug method. If the output is exactly the same as the input, the string is already a valid slug. And this is indeed how the Registry checks if a given slug is in the correct format.

Licence

One of the optional metadata fields of a vocabulary entity is licence. The value of this field is free text; however, the Registry translates certain recommended values into groups during the indexing process. The following table sets out the recommended values, and how they are indexed for searching. (The second column shows the only values that are included in search results; the values in the first column do not appear in search results.)

Value in Registry Schema	As indexed for searching
CC-BY	Open Licence
CC-BY-SA	Open Licence
CC-BY-ND	Non-Derivative Licence
CC-BY-NC	Non-Commercial Licence
CC-BY-NC-SA	Non-Commercial Licence
CC-BY-NC-ND	Non-Derivative Licence
CC0	Open Licence
ODC-By	Open Licence
GPL	Open Licence
CSIRODataLicence	Non-Commercial Licence
AusGoalRestrictive	Restrictive Licence
NoLicence	No Licence
Unknown/Other	Unknown/Other
any other value	Unknown/Other

Fields that contain HTML data

Most of the fields of vocabulary and version metadata are "plain text". But there are three fields which, if values are supplied, are required to be HTML content. Those fields are:

vocabulary note
vocabulary description
version note

A value for one of these fields must be provided as a well-formed HTML fragment, and there are limitations on the included tags. In practice, that means that only the following tags are allowed:

a, b, blockquote, br, cite, code, dd, dl, dt, em, i, li, ol, p, pre, q, small, span, strike, strong, sub, sup, u, ul , and appropriate attributes.
Links (a elements) can point to http, https, ftp, mailto. The Registry applies the attributes rel="nofollow" target="_blank" to all link elements in incoming data.

In particular, no html, head, or body tags are expected or allowed. Please note that the requirement to be well-formed is intended in the XML sense, i.e., matching end tags must be provided. For example, this is valid:

XML

This line has a <b>bold</b> word.<br />This is another line.

but this is not:

XML

<p>This is the first line.<br>This is the second which ends with <b>bold.

The first example above shows that there is no need to use a tag at the top level of the content; you may begin and end the content with text. Indeed, no HTML tags are required; the content may be just "plain text", as long as the usual HTML escaping has been applied; for example:

XML

This is a valid description which has an ampersand (&amp;) and a less-than sign (&lt;).

(Note: if you are entering content in the Portal user interface, HTML fields are presented using an embedded HTML editor that does all this for you. It's only if you are using the Registry API that you would need to pay attention to this.)

When providing HTML data to an API method, content must be encoded as per the rules of the transfer format being used. For example, when sending data encoded in XML, these values are provided as attributes. Therefore, when working with HTML data encoded in XML, "double-encoding" must be applied. For example, if the two valid examples above were being being provided as a vocabulary description and note, they would be specified in XML as:

XML

<vocabulary ... description="This description has an ampersand (&amp;amp;) and a less-than sign (&amp;lt;)."
   note="This line has a &lt;b&gt;bold&lt;/b&gt; word.&lt;br /&gt;This is another line."
 ...>

If you are using an API client library, you typically don't need to worry about this double-escaping and unescaping of HTML data; the library handles it for you; you only need to pay attention to the first level of escaping.

When working with HTML data encoded in JSON, no additional escaping is required in addition to the normal JSON rules, i.e., every \ must be escaped as \\, and every " must be escaped as \".

XML data

The Registry's API methods work with structures described using XML Schema. (See, e.g., Wikipedia: XML Schema (W3C).) In practice, this means:

XML is the "canonical" data format used by the Registry's API methods;
a description of the API data structures is accessible in XML Schema format;
if desired, the Schema can be used to validate data before sending it to API methods; conversely, API clients can rely on the structure of data returned by API methods.

XML namespace

The elements of the Registry Schema that correspond to model entities are defined within an XML namespace. The XML namespace is currently http://vocabs.ands.org.au/registry/schema/2017/01/vocabulary. (Please note: the XML namespace continues to use the "legacy" hostname vocabs.ands.org.au, and you must use this hostname in the XML namespace, even though this is no longer the canonical hostname of the RVA production server. As noted below, the schema files can now only be accessed with the new hostname vocabs.ardc.edu.au.) This means:

when sending model entity data to the Registry in XML format, you must qualify the elements using the Schema's namespace;
when processing model entity data returned from the Registry in XML format, you should expect the elements to be qualified using the Schema's namespace.

There is also a separate XML namespace defined in the common data schema, but as that schema only defines enumerated types that are used in attribute values, the URL of that namespace does not appear in instances of Registry structured data.

It is a very common error when working with XML data to fail to pay attention to XML namespaces. For example, if you attempt to process the output of an API method using XSLT templates, but fail to include the namespace in the template match patterns, your templates will not match any of the returned content.

However, not all XML data returned from the Registry API uses the Registry Schema. The key exception is error content. Such content uses elements that are not defined in the Registry Schema, and those elements are not in any namespace.

The Registry Schema files can be downloaded here:

Common data (definition of enumerated types used in the Registry Schema): https://vocabs.ardc.edu.au/registry/schema/2017/01/common-types.xsd
Registry Schema: https://vocabs.ardc.edu.au/registry/schema/2017/01/registry-schema.xsd

Further documentation about the Registry Schema, including a convenient means of navigating the elements and attributes, is available here: https://vocabs.ardc.edu.au/registry/schema/2017/01/vocabulary.

General principles of XML data

This section outlines principles that apply to constructing XML data to pass to the Registry, and to the interpretation of data returned from the Registry.

XML Schema constraints

The XML Schema definition of the Registry schema includes many constraints that apply to instances of structured data.

To find out more, you can download and read the XML Schema description "as is", but there are also a number of software tools that provide a convenient user interface to navigate an XML Schema file.

As an example of these constraints, the definition of the <vocabulary> element includes these fragments:

XML

<xs:attribute name="id" type="id-type"> </xs:attribute>
<xs:attribute name="status" use="required" type="common:vocabulary-status"> </xs:attribute>
...
<xs:attribute name="title" use="required" type="xs:token"/>
<xs:attribute name="acronym" type="xs:token"/>
<xs:attribute name="description" type="xs:token">
    <xs:annotation>
        <xs:documentation>The content must be a well-formed HTML fragment.</xs:documentation>
    </xs:annotation>
</xs:attribute>

This means that every <vocabulary> element:

may have an id attribute, and, if provided, its content must be of type id-type (which is defined in the schema to be the set of integers greater than or equal to 1)
must have a status attribute, and its content must be one of the values of the enumerated type common:vocabulary-status (defined in the common data schema to be the values published, draft, and deprecated)
must have a title attribute;
may have an acronym attribute; and
may have a description attribute, and, if provided, its content must be a well-formed HTML fragment.

Attributes of type xs:token (in the list above, the title, acronym, and description attributes) have string values that are "normalised" during processing according to the normal rules of XML. That means:

all leading and trailing whitespace is removed;
line feeds, carriage returns, and tabs are converted into spaces; and then
multiple contiguous spaces are coalesced into one.

This example shows both the power and the limitations of XML Schema. In particular, the requirement that the value of the description attribute be a well-formed HTML fragment can only be specified here as a documentation annotation. In fact, there are additional constraints on the format of values of the description attribute; those are explained above.

JSON data

The Registry's API methods can also accept and produce data in JSON format. The Registry's JSON support is based directly on the XML Schema definition. Because there is no direct equivalent of the XML Schema files for JSON, ARDC recommends that you use the Swagger UI to explore and confirm the structure of JSON data.

The following example illustrates the correspondences between XML and JSON data:

XML

JSON

XML

<related-entity id="1" type="party" owner="ANDS-Curated"
                title="Australian National Data Service"
                email="services@ands.org.au">
  <related-entity-identifier id="1" identifier-type="auAnlPeau"
                             identifier-value="nla.party-1508909"/>
  <url>http://ands.org.au/</url>
</related-entity>

JS

{
  "id": 1,
  "type": "party",
  "owner": "ANDS-Curated",
  "title": "Australian National Data Service",
  "email": "services@ands.org.au",
  "related-entity-identifier": [
    {
      "id": 1,
      "identifier-type": "auAnlPeau",
      "identifier-value": "nla.party-1508909"
    }
  ],
  "url": [ "http://ands.org.au/" ] 
}

Observe these points:

Where the XML data includes an element, this is represented in JSON as an object with key/value pairs. Each of the element's attribute and nested element types is represented as one key/value pair.
In the JSON data, the data types of values follow their corresponding definitions in the XML Schema, where this is possible. This is not always obvious in the XML version of the data. For example:
- The XML Schema specifies that the values of the two id attributes are integers; therefore, in the JSON, the values are specified as integers (i.e., without surrounding double quotes).
- The XML Schema specifies that the value of the title attribute is a string, therefore, in the JSON, the value is specified as a string (i.e., with surrounding double quotes).
Where the XML Schema specifies that a nested element may have multiplicity greater than one, but the data contains only one such nested element, the data must nevertheless be represented in JSON as an array. For example:
- The XML Schema specifies that there can be more than one nested related-entity-identifier element; therefore, in the JSON, the value corresponding to the key related-entity-identifier must be an array of objects (each of which has the structure of a related-entity-identifier).
- The XML Schema specifies that there can be more than one nested url element; therefore, in the JSON, the value corresponding to the key url must be an array of strings.
It is a requirement of any XML data that there be a top-level element. JSON does not have such a requirement, and therefore, the JSON data does not mention the name of the top-level element.
- The example XML has a top-level element related-entity. In the JSON, the element name related-entity does not occur; the entire object value is the related-entity.

Response type: Result

A number of API methods return a simple, scalar value: either an integer, Boolean, or string. For these methods, the response type is specified as Result. In XML data, this is an element <result> with a child element, whose text content is the result. (NB: neither this element nor its child elements are in the XML namespace used for the Registry Schema.) For JSON data, the response is a JSON object, with a key/value pair, with the value given as the appropriate type. The child element/key is either integerValue, booleanValue, or stringValue. Here is an example of a string result in XML:

XML

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<result>
  <stringValue>This is a sample string result</stringValue>
</result>

The corresponding JSON would be:

JS

{
  "stringValue": "This is a sample string result"
}

The integer value 17 would be returned in XML as:

XML

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<result>
  <integerValue>17</stringValue>
</result>

and in JSON as:

JS

{
  "integerValue": 17
}

In almost all cases, there will be exactly one child element or key-value pair. However, one API method (createUpload) returns both an integer and a string value within the same Result. The documentation for each method explains what you should expect, for example, "Result/stringValue".