Validation of Registry model entity data
A number of Registry API methods accept input containing metadata that describes model entities. Such methods always apply validation to that metadata: there is a certain amount of "sanity checking", and a number of "business rules" are checked.
If you create or update vocabulary metadata using the Vocabulary Portal, the Portal does the work of ensuring that you enter metadata that will pass the validation checks. When using the Registry API directly, you (and/or your program) must take care to provide valid metadata. This page gives the precise details of the validation process to enable you to do that.
Please note: the basic philosophy of the validation process can be summed up as: "what is not explicitly forbidden, is permitted; whether it is a good idea is another matter".
Validation of vocabulary metadata
Validation mode enumerated type
The Registry API methods that apply validation to vocabulary metadata operate in one of two modes. Within the Registry, there is an enumerated type ValidationMode
, which has two values:
CREATE
: used by API methods (such as createVocabulary and createRelatedEntity) that create a new top-level model entity;UPDATE
: used by API methods (such as updateVocabulary and updateRelatedEntity) that update an existing model entity.
The particular set of validation checks applied to incoming metadata is determined by the type of the metadata and the validation mode of the API method. If an API method that applies validation is listed at Vocabulary Registry API methods, the method description indicates which validation mode is applied.
Vocabulary metadata validation in detail
The validation process attempts to check as many things as possible, and to collect a full list of constraint violations, rather than bailing out at the first violation found. The result of this is that there may, in some cases, be a "cascade" of violations. However, in practice, such a cascade tends to be quite limited, as the various checks have a high level of independence.
When the validation mode is UPDATE
, the registry's database is consulted to get the existing vocabulary metadata. A first attempt is made to fetch a currently-valid vocabulary instance (i.e., with status either "published" or "deprecated"). If this fails, an attempt is made to fetch a draft vocabulary instance. If this is unsuccessful, a violation is created.
A general note about maximum field lengths: some length checks are enforced, in order to prevent attempts to write "too much" data to a column of the database (i.e., that would exceed its maximum length). For now, such checks are done on the fields that contain HTML data; each is limited to 10000 characters.
Where reference is made to an enumerated type, please refer to the Registry Schema to see exactly which enumerated type is used, and what the type's allowed values are. (See the section "XML data" of Vocabulary Registry data for more information.)
The following sections details the checks applied during validation.
Checked at all levels of metadata
These are checked at all levels of metadata:
id
attributes are checked. In general, anid
attribute must not be specified when creating something; it must be specified when updating something. (When updating, the absence of anid
is interpreted as a request to create a new entity.)
Checked at the vocabulary level
At the top, vocabulary level:
The
status
attribute must be specified. Its value must be one of the values of the enumerated type.The
owner
attribute must be specified, and be non-empty. The API method will check that the caller of the method is authorized to specify the particular owner value.The
slug
attribute is checked.If mode is
CREATE
, theslug
attribute is optional. If it is specified:The specified slug must be a valid slug. This means:
It must be non-empty.
It must have the format of a slug: i.e., when passed through the slug generator, the output exactly equals the input. (See the section "Slugs" at Vocabulary Registry data.)
The slug must not be the same as the slug of any other vocabulary in the system.
If mode is
UPDATE
, the slug must be specified, and it must be the same as the slug of the existing database entry. I.e., for now, changing the slug is not supported through the API.
The
title
attribute must be specified, and be non-empty.If the
slug
attribute was not specified, but thetitle
attribute was, then thetitle
attribute is passed through the slug generator; the result must not be a slug that is either an empty string, or a value that is already in use by another vocabulary.The
acronym
attribute is optional. It is not examined.The
description
attribute must be specified, non-empty, and be a valid HTML fragment according to the validity rules specified in the section "Fields that contain HTML data" of Vocabulary Registry data. Its length must not be greater than 10000 characters.The
note
attribute is optional. If specified, it must be a valid HTML fragment according to the validity rules specified in the section "Fields that contain HTML data" of Vocabulary Registry data. Its length must not be greater than 10000 characters.The
revision-cycle
attribute is optional. It is not examined.The
creation-date
attribute is required, and its value must be in one of the permitted formats (YYYY, YYYY-MM, YYYY-MM-DD).The
primary-language
attribute must be specified, and its value must be a valid language code. Here, "valid" means an IETF BCP 47 language tag, such asen
,de-AT
, etc.The
licence
attribute is optional. It is not examined.There must be at least one
subject
element that draws on the ANZSRC-FOR vocabulary. Eachsubject
element is validated as specified below.The
other-language
element is optional; if specified, it is validated as specified below.The
poolparty-project
element is optional; if specified, it is validated as specified below.The
top-concept
element is optional; if specified, it is validated as specified below.The
related-entity-ref
element is required; it is validated as specified below.There must be at least one valid
related-entity-ref
element that includespublishedBy
as the value of one of itsrelation
elements.The
related-vocabulary-ref
element is optional; if specified, it is validated as specified below.The
version
element is optional; if specified, it is validated as specified below.However, the
slug
attributes of anyversion
elements are collected and validated here. There are not allowed to be two versions with the same slug. For the purposes of this duplicate check, if anyversion
does not have aslug
attribute, a slug is generated from the value of thetitle
attribute.
For each subject
element (there may be more than one):
The
source
attribute must be specified, and its value must be one of the permitted values. The permitted values are:anzsrc-for
,anzsrc-seo
,gcmd
,local
.The
label
attribute must be specified, and be non-empty.If the value of the
source
attribute is such that an IRI must be specified (anzsrc-for
,anzsrc-seo
,gcmd
), then theiri
attribute must be specified, and be one of the IRIs for that source.If the
source
attribute has the valuelocal
, the value of thelabel
attribute must not be a duplicate of the value of thelabel
attribute of any othersubject
element whosesource
attribute has the valuelocal
.If the
source
attribute does not have the valuelocal
, the value of theiri
attribute must not be a duplicate of the value of theiri
attribute of any othersubject
element whosesource
attribute has a value not equal tolocal
.
For each other-language
element (there may be more than one):
The value of the element must be a valid language code. Here, "valid" means an IETF BCP 47 language tag, such as
en
,de-AT
, etc.The value of the element must not be the same as the value of the
primary-language
attribute.The value of the element must not be the same as the value of any other
other-language
element.
If a poolparty-project
element is specified:
The
server-id
attribute must be specified as the value1
.The
project-id
attribute must be specified, and be non-empty.
For each top-concept
element (there may be more than one):
The value of the element must be non-empty.
The value of the element must not be the same as the value of any other
top-concept
element.
For each related-entity-ref
element (there may be more than one):
The
id
attribute must be specified, and its value must correspond to that of a current related entity in the database.The value of the
id
attribute must not match the value of theid
attribute of any otherrelated-entity-ref
element.There must be at least one
relation
element.For each
relation
element:The value of the element must be one of the values of the enumerated type. Validity is further contrained based on the type of the related entity.
If the related entity is a party, the value of the
relation
element may be any of:consumerOf
,hasAuthor
,hasContributor
,implementedBy
,pointOfContact
,publishedBy
.If the related entity is a service, the value of the
relation
element may be any of:hasAssociationWith
,isUsedBy
,isPresentedBy
.If the related entity is a vocabulary, the value of the
relation
element may be any of:enriches
,hasAssociationWith
,isDerivedFrom
,isPartOf
.
The value of the element must not be the same as the value of any other
relation
element contained within thisrelated-entity-ref
element.
For each related-vocabulary-ref
element (there may be more than one):
The
id
attribute must be specified, and its value must correspond to that of a current vocabulary in the database.If mode is
UPDATE
– i.e., this vocabulary already exists, and the top-levelvocabulary
element has anid
attribute – then the value of this element'sid
attribute must not equal the value of the top-levelvocabulary
element'sid
attribute. In other words, it is not permitted to create a self-reference. (Cycles in the chain of related vocabularies are permitted.)The value of the
id
attribute must not match the value of theid
attribute of any otherrelated-vocabulary-ref
element.There must be at least one
relation
element.For each
relation
element:The value of the element must be one of the values of the enumerated type.
The value of the element must not be the same as the value of any other
relation
element contained within thisrelated-vocabulary-ref
element.
Checked at the version level
For each version
element (there may be more than one):
The
status
attribute must be specified. Its value must be any of the values of the enumerated type.The
title
attribute must be specified, and be non-empty.The
slug
attribute is optional. If it is specified:The specified slug must be a valid slug. This means:
It must be non-empty.
It must have the format of a slug: i.e., when passed through the slug generator, the output exactly equals the input. (See the section "Slugs" at Vocabulary Registry data.)
The
note
attribute is optional. If specified, it must be a valid HTML fragment according to the validity rules specified in the section "Fields that contain HTML data" of Vocabulary Registry data. Its length must not be greater than 10000 characters.The
release-date
attribute is required, and its value must be in one of the permitted formats (YYYY, YYYY-MM, YYYY-MM-DD).The
do-poolparty-harvest
attribute is optional. If specified, its value must be eithertrue
orfalse
, and the value may only betrue
if the top-level vocabulary metadata contains apoolparty-project
element.The
do-import
attribute is optional. If specified, its value must be eithertrue
orfalse
, and the value may only betrue
if theversion
element specifies something to import. For now, that means that at least one of these must be satisfied:the value of the
do-poolparty-harvest
attribute istrue
, orthere is at least one
access-point
element, the value of whosediscriminator
attribute isfile
.
The
do-publish
attribute is optional. If specified, its value must be eithertrue
orfalse
, and the value may only betrue
if the value of thedo-import
attribute is alsotrue
.The
version
element must specify the creation of at least one access point. For now, that means that at least one of these must be satisfied:the value of the
do-poolparty-harvest
attribute istrue
, and the value of thedo-import
attribute istrue
, orthere is at least one
access-point
element, the value of whosesource
attribute isuser
.
The
version
element may specify one or morebrowse-flag
elements. If any are provided, they are validated individually, and together as a set of flags, as follows:Each value must be any one of the values of the enumerated type.
If the
defaultSortByNotation
flag is specified, then themaySortByNotation
flag must also be specified.If any of the
notation...
flags (i.e.,notationAlpha
,notationFloat
, ornotationDotted
) is specified, then themaySortByNotation
flag must also be specified.If the
maySortByNotation
flag is specified, then exactly one of the notation... flags (i.e.,notationAlpha
,notationFloat
, ornotationDotted
) must be specified.
Checked at the access point level
For each access-point
element (there may be more than one):
The
source
attribute must be specified. Its value must be one of the values of the enumerated type.The
discriminator
attribute must be specified. Its value must be one of the values of the enumerated type.There must be an element corresponding to the value of the
discriminator
attribute. For example, if the value of thediscriminator
attribute isapiSparql
, there must be anap-api-sparql
element.There is additional type-specific validation:
If the value of the
discriminator
attribute isapiSparql
:If the mode is
CREATE
, the value of thesource
attribute must beuser
.The
url
attribute must be specified, and its value must be a valid URL.
If the value of the
discriminator
attribute isfile
:The value of the
source
attribute must beuser
.The
upload-id
attribute must be specified, and its value must be an integer greater than zero. Additional authorization checks are applied by the API methods, including a check that the value of theupload-id
attribute corresponds to an existing upload that is available to theowner
specified at the top-level vocabulary metadata.
If the value of the
discriminator
attribute issesameDownload
:The mode must be
UPDATE
, and the value of thesource
attribute must besystem
.The
url-prefix
attribute must be specified, and its value must be a valid URL.
If the value of the
discriminator
attribute ississvoc
:If the mode is
CREATE
, the value of thesource
attribute must beuser
.The
url-prefix
attribute must be specified, and its value must be a valid URL.
If the value of the
discriminator
attribute iswebPage
:The
url
attribute must be specified, and its value must be a valid URL.
Validation of related entity metadata
The validation process attempts to check as many things as possible, and to collect a full list of constraint violations, rather than bailing out at a first violation. The various checks have a high level of independence, and there is minimal chance of a "cascade" of violations.
The following sections detail what is checked.
Checked at the related entity level
These are checked:
The
id
attribute is checked.If mode is
CREATE
, theid
attribute must not be provided.If mode is
UPDATE
, theid
attribute must be provided.
The
type
attribute must be specified. Its value must be one of the values of the enumerated type.The
owner
attribute must be specified, and be non-empty. The API method will check that the caller of the method is authorized to specify the particular owner value.The
title
attribute must be specified, and be non-empty.The
email
attribute is optional. If it is specified:The specified value must be a valid email address. This means, it must have the format of an email address.
The
phone
attribute is optional. If it is specified, it must be non-empty.
Checked at the url level
The url
child element is optional. For each one that is provided, this is checked:
The body of the element must be non-empty, and its value must have the form of a valid URL.
Checked at the related entity identifier level
The related-entity-identifier
child element is optional. For each one that is provided, this is checked:
The
id
attribute is checked.If mode is
CREATE
, theid
attribute must not be provided.
The
identifier-type
attribute must be specified. Its value must be one of the values of the enumerated type.The
identifier-value
attribute must be specified, and be non-empty. Its value must be consistent with the value ofidentifier-type
.
There is an additional check that the same identifier has not been included more than once. For the purposes of this duplicate check, two identifiers are considered to be identical if they have both the same identifier-type
and identifier-value
.