Validation of Registry model entity data

A number of Registry API methods accept input containing metadata that describes model entities. Such methods always apply validation to that metadata: there is a certain amount of "sanity checking", and a number of "business rules" are checked.

If you create or update vocabulary metadata using the Vocabulary Portal, the Portal does the work of ensuring that you enter metadata that will pass the validation checks. When using the Registry API directly, you (and/or your program) must take care to provide valid metadata. This page gives the precise details of the validation process to enable you to do that.

Please note: the basic philosophy of the validation process can be summed up as: "what is not explicitly forbidden, is permitted; whether it is a good idea is another matter".

Validation of vocabulary metadata

Validation mode enumerated type

The Registry API methods that apply validation to vocabulary metadata operate in one of two modes. Within the Registry, there is an enumerated type ValidationMode, which has two values:

CREATE: used by API methods (such as createVocabulary and createRelatedEntity) that create a new top-level model entity;
UPDATE: used by API methods (such as updateVocabulary and updateRelatedEntity) that update an existing model entity.

The particular set of validation checks applied to incoming metadata is determined by the type of the metadata and the validation mode of the API method. If an API method that applies validation is listed at Vocabulary Registry API methods, the method description indicates which validation mode is applied.

Vocabulary metadata validation in detail

The validation process attempts to check as many things as possible, and to collect a full list of constraint violations, rather than bailing out at the first violation found. The result of this is that there may, in some cases, be a "cascade" of violations. However, in practice, such a cascade tends to be quite limited, as the various checks have a high level of independence.

When the validation mode is UPDATE, the registry's database is consulted to get the existing vocabulary metadata. A first attempt is made to fetch a currently-valid vocabulary instance (i.e., with status either "published" or "deprecated"). If this fails, an attempt is made to fetch a draft vocabulary instance. If this is unsuccessful, a violation is created.

A general note about maximum field lengths: some length checks are enforced, in order to prevent attempts to write "too much" data to a column of the database (i.e., that would exceed its maximum length). For now, such checks are done on the fields that contain HTML data; each is limited to 10000 characters.

Where reference is made to an enumerated type, please refer to the Registry Schema to see exactly which enumerated type is used, and what the type's allowed values are. (See the section "XML data" of Vocabulary Registry data for more information.)

The following sections details the checks applied during validation.

Checked at all levels of metadata

These are checked at all levels of metadata:

id attributes are checked. In general, an id attribute must not be specified when creating something; it must be specified when updating something. (When updating, the absence of an id is interpreted as a request to create a new entity.)

Checked at the vocabulary level

At the top, vocabulary level:

The status attribute must be specified. Its value must be one of the values of the enumerated type.
The owner attribute must be specified, and be non-empty. The API method will check that the caller of the method is authorized to specify the particular owner value.
The slug attribute is checked.
- If mode is CREATE, the slug attribute is optional. If it is specified:
  - The specified slug must be a valid slug. This means:
    - It must be non-empty.
    - It must have the format of a slug: i.e., when passed through the slug generator, the output exactly equals the input. (See the section "Slugs" at Vocabulary Registry data.)
  - The slug must not be the same as the slug of any other vocabulary in the system.
- If mode is UPDATE, the slug must be specified, and it must be the same as the slug of the existing database entry. I.e., for now, changing the slug is not supported through the API.
The title attribute must be specified, and be non-empty.
If the slug attribute was not specified, but the title attribute was, then the title attribute is passed through the slug generator; the result must not be a slug that is either an empty string, or a value that is already in use by another vocabulary.
The acronym attribute is optional. It is not examined.
The description attribute must be specified, non-empty, and be a valid HTML fragment according to the validity rules specified in the section "Fields that contain HTML data" of Vocabulary Registry data. Its length must not be greater than 10000 characters.
The note attribute is optional. If specified, it must be a valid HTML fragment according to the validity rules specified in the section "Fields that contain HTML data" of Vocabulary Registry data. Its length must not be greater than 10000 characters.
The revision-cycle attribute is optional. It is not examined.
The creation-date attribute is required, and its value must be in one of the permitted formats (YYYY, YYYY-MM, YYYY-MM-DD).
The primary-language attribute must be specified, and its value must be a valid language code. Here, "valid" means an IETF BCP 47 language tag, such as en, de-AT, etc.
The licence attribute is optional. It is not examined.
There must be at least one subject element that draws on the ANZSRC-FOR vocabulary. Each subject element is validated as specified below.
The other-language element is optional; if specified, it is validated as specified below.
The poolparty-project element is optional; if specified, it is validated as specified below.
The top-concept element is optional; if specified, it is validated as specified below.
The related-entity-ref element is required; it is validated as specified below.
There must be at least one valid related-entity-ref element that includes publishedBy as the value of one of its relation elements.
The related-vocabulary-ref element is optional; if specified, it is validated as specified below.
The version element is optional; if specified, it is validated as specified below.
- However, the slug attributes of any version elements are collected and validated here. There are not allowed to be two versions with the same slug. For the purposes of this duplicate check, if any version does not have a slug attribute, a slug is generated from the value of the title attribute.

For each subject element (there may be more than one):

The source attribute must be specified, and its value must be one of the permitted values. The permitted values are: anzsrc-for, anzsrc-seo, gcmd, local.
The label attribute must be specified, and be non-empty.
If the value of the source attribute is such that an IRI must be specified (anzsrc-for, anzsrc-seo, gcmd), then the iri attribute must be specified, and be one of the IRIs for that source.
If the source attribute has the value local, the value of the label attribute must not be a duplicate of the value of the label attribute of any other subject element whose source attribute has the value local.
If the source attribute does not have the value local, the value of the iri attribute must not be a duplicate of the value of the iri attribute of any other subject element whose source attribute has a value not equal to local.

For each other-language element (there may be more than one):

The value of the element must be a valid language code. Here, "valid" means an IETF BCP 47 language tag, such as en, de-AT, etc.
The value of the element must not be the same as the value of the primary-language attribute.
The value of the element must not be the same as the value of any other other-language element.

If a poolparty-project element is specified:

The server-id attribute must be specified as the value 1.
The project-id attribute must be specified, and be non-empty.

For each top-concept element (there may be more than one):

The value of the element must be non-empty.
The value of the element must not be the same as the value of any other top-concept element.

For each related-entity-ref element (there may be more than one):

The id attribute must be specified, and its value must correspond to that of a current related entity in the database.
The value of the id attribute must not match the value of the id attribute of any other related-entity-ref element.
There must be at least one relation element.
For each relation element:
- The value of the element must be one of the values of the enumerated type. Validity is further contrained based on the type of the related entity.
  - If the related entity is a party, the value of the relation element may be any of: consumerOf, hasAuthor, hasContributor, implementedBy, pointOfContact, publishedBy.
  - If the related entity is a service, the value of the relation element may be any of: hasAssociationWith, isUsedBy, isPresentedBy.
  - If the related entity is a vocabulary, the value of the relation element may be any of: enriches, hasAssociationWith, isDerivedFrom, isPartOf.
- The value of the element must not be the same as the value of any other relation element contained within this related-entity-ref element.

For each related-vocabulary-ref element (there may be more than one):

The id attribute must be specified, and its value must correspond to that of a current vocabulary in the database.
If mode is UPDATE – i.e., this vocabulary already exists, and the top-level vocabulary element has an id attribute – then the value of this element's id attribute must not equal the value of the top-level vocabulary element's id attribute. In other words, it is not permitted to create a self-reference. (Cycles in the chain of related vocabularies are permitted.)
The value of the id attribute must not match the value of the id attribute of any other related-vocabulary-ref element.
There must be at least one relation element.
For each relation element:
- The value of the element must be one of the values of the enumerated type.
- The value of the element must not be the same as the value of any other relation element contained within this related-vocabulary-ref element.

Checked at the version level

For each version element (there may be more than one):

The status attribute must be specified. Its value must be any of the values of the enumerated type.
The title attribute must be specified, and be non-empty.
The slug attribute is optional. If it is specified:
- The specified slug must be a valid slug. This means:
- It must be non-empty.
- It must have the format of a slug: i.e., when passed through the slug generator, the output exactly equals the input. (See the section "Slugs" at Vocabulary Registry data.)
The note attribute is optional. If specified, it must be a valid HTML fragment according to the validity rules specified in the section "Fields that contain HTML data" of Vocabulary Registry data. Its length must not be greater than 10000 characters.
The release-date attribute is required, and its value must be in one of the permitted formats (YYYY, YYYY-MM, YYYY-MM-DD).
The do-poolparty-harvest attribute is optional. If specified, its value must be either true or false, and the value may only be true if the top-level vocabulary metadata contains a poolparty-project element.
The do-import attribute is optional. If specified, its value must be either true or false, and the value may only be true if the version element specifies something to import. For now, that means that at least one of these must be satisfied:
- the value of the do-poolparty-harvest attribute is true, or
- there is at least one access-point element, the value of whose discriminator attribute is file.
The do-publish attribute is optional. If specified, its value must be either true or false, and the value may only be true if the value of the do-import attribute is also true.
The version element must specify the creation of at least one access point. For now, that means that at least one of these must be satisfied:
- the value of the do-poolparty-harvest attribute is true, and the value of the do-import attribute is true, or
- there is at least one access-point element, the value of whose source attribute is user.
The version element may specify one or more browse-flag elements. If any are provided, they are validated individually, and together as a set of flags, as follows:
- Each value must be any one of the values of the enumerated type.
- If the defaultSortByNotation flag is specified, then the maySortByNotation flag must also be specified.
- If any of the notation... flags (i.e., notationAlpha, notationFloat, or notationDotted) is specified, then the maySortByNotation flag must also be specified.
- If the maySortByNotation flag is specified, then exactly one of the notation... flags (i.e., notationAlpha, notationFloat, or notationDotted) must be specified.

Checked at the access point level

For each access-point element (there may be more than one):

The source attribute must be specified. Its value must be one of the values of the enumerated type.
The discriminator attribute must be specified. Its value must be one of the values of the enumerated type.
There must be an element corresponding to the value of the discriminator attribute. For example, if the value of the discriminator attribute is apiSparql, there must be an ap-api-sparql element.
There is additional type-specific validation:
- If the value of the discriminator attribute is apiSparql:
  - If the mode is CREATE, the value of the source attribute must be user.
  - The url attribute must be specified, and its value must be a valid URL.
- If the value of the discriminator attribute is file:
  - The value of the source attribute must be user.
  - The upload-id attribute must be specified, and its value must be an integer greater than zero. Additional authorization checks are applied by the API methods, including a check that the value of the upload-id attribute corresponds to an existing upload that is available to the owner specified at the top-level vocabulary metadata.
- If the value of the discriminator attribute is sesameDownload:
  - The mode must be UPDATE, and the value of the source attribute must be system.
  - The url-prefix attribute must be specified, and its value must be a valid URL.
- If the value of the discriminator attribute is sissvoc:
  - If the mode is CREATE, the value of the source attribute must be user.
  - The url-prefix attribute must be specified, and its value must be a valid URL.
- If the value of the discriminator attribute is webPage:
  - The url attribute must be specified, and its value must be a valid URL.

Validation of related entity metadata

The validation process attempts to check as many things as possible, and to collect a full list of constraint violations, rather than bailing out at a first violation. The various checks have a high level of independence, and there is minimal chance of a "cascade" of violations.

The following sections detail what is checked.

Checked at the related entity level

These are checked:

The id attribute is checked.
- If mode is CREATE, the id attribute must not be provided.
- If mode is UPDATE, the id attribute must be provided.
The type attribute must be specified. Its value must be one of the values of the enumerated type.
The owner attribute must be specified, and be non-empty. The API method will check that the caller of the method is authorized to specify the particular owner value.
The title attribute must be specified, and be non-empty.
The email attribute is optional. If it is specified:
- The specified value must be a valid email address. This means, it must have the format of an email address.
The phone attribute is optional. If it is specified, it must be non-empty.

Checked at the url level

The url child element is optional. For each one that is provided, this is checked:

The body of the element must be non-empty, and its value must have the form of a valid URL.

Checked at the related entity identifier level

The related-entity-identifier child element is optional. For each one that is provided, this is checked:

The id attribute is checked.
- If mode is CREATE, the id attribute must not be provided.
The identifier-type attribute must be specified. Its value must be one of the values of the enumerated type.
The identifier-value attribute must be specified, and be non-empty. Its value must be consistent with the value of identifier-type.

There is an additional check that the same identifier has not been included more than once. For the purposes of this duplicate check, two identifiers are considered to be identical if they have both the same identifier-type and identifier-value.