Validation of Registry model entity data
A number of Registry API methods accept input containing metadata that describes model entities. Such methods always apply validation to that metadata: there is a certain amount of "sanity checking", and a number of "business rules" are checked.
If you create or update vocabulary metadata using the Vocabulary Portal, the Portal does the work of ensuring that you enter metadata that will pass the validation checks. When using the Registry API directly, you (and/or your program) must take care to provide valid metadata. This page gives the precise details of the validation process to enable you to do that.
Please note: the basic philosophy of the validation process can be summed up as: "what is not explicitly forbidden, is permitted; whether it is a good idea is another matter".
Validation of vocabulary metadata
Validation mode enumerated type
The Registry API methods that apply validation to vocabulary metadata operate in one of two modes. Within the Registry, there is an enumerated type ValidationMode, which has two values:
CREATE: used by API methods (such as createVocabulary and createRelatedEntity) that create a new top-level model entity;UPDATE: used by API methods (such as updateVocabulary and updateRelatedEntity) that update an existing model entity.
The particular set of validation checks applied to incoming metadata is determined by the type of the metadata and the validation mode of the API method. If an API method that applies validation is listed at Vocabulary Registry API methods, the method description indicates which validation mode is applied.
Vocabulary metadata validation in detail
The validation process attempts to check as many things as possible, and to collect a full list of constraint violations, rather than bailing out at the first violation found. The result of this is that there may, in some cases, be a "cascade" of violations. However, in practice, such a cascade tends to be quite limited, as the various checks have a high level of independence.
When the validation mode is UPDATE, the registry's database is consulted to get the existing vocabulary metadata. A first attempt is made to fetch a currently-valid vocabulary instance (i.e., with status either "published" or "deprecated"). If this fails, an attempt is made to fetch a draft vocabulary instance. If this is unsuccessful, a violation is created.
A general note about maximum field lengths: some length checks are enforced, in order to prevent attempts to write "too much" data to a column of the database (i.e., that would exceed its maximum length). For now, such checks are done on the fields that contain HTML data; each is limited to 10000 characters.
Where reference is made to an enumerated type, please refer to the Registry Schema to see exactly which enumerated type is used, and what the type's allowed values are. (See the section "XML data" of Vocabulary Registry data for more information.)
The following sections details the checks applied during validation.
Checked at all levels of metadata
These are checked at all levels of metadata:
idattributes are checked. In general, anidattribute must not be specified when creating something; it must be specified when updating something. (When updating, the absence of anidis interpreted as a request to create a new entity.)
Checked at the vocabulary level
At the top, vocabulary level:
The
statusattribute must be specified. Its value must be one of the values of the enumerated type.The
ownerattribute must be specified, and be non-empty. The API method will check that the caller of the method is authorized to specify the particular owner value.The
slugattribute is checked.If mode is
CREATE, theslugattribute is optional. If it is specified:The specified slug must be a valid slug. This means:
It must be non-empty.
It must have the format of a slug: i.e., when passed through the slug generator, the output exactly equals the input. (See the section "Slugs" at Vocabulary Registry data.)
The slug must not be the same as the slug of any other vocabulary in the system.
If mode is
UPDATE, the slug must be specified, and it must be the same as the slug of the existing database entry. I.e., for now, changing the slug is not supported through the API.
The
titleattribute must be specified, and be non-empty.If the
slugattribute was not specified, but thetitleattribute was, then thetitleattribute is passed through the slug generator; the result must not be a slug that is either an empty string, or a value that is already in use by another vocabulary.The
acronymattribute is optional. It is not examined.The
descriptionattribute must be specified, non-empty, and be a valid HTML fragment according to the validity rules specified in the section "Fields that contain HTML data" of Vocabulary Registry data. Its length must not be greater than 10000 characters.The
noteattribute is optional. If specified, it must be a valid HTML fragment according to the validity rules specified in the section "Fields that contain HTML data" of Vocabulary Registry data. Its length must not be greater than 10000 characters.The
revision-cycleattribute is optional. It is not examined.The
creation-dateattribute is required, and its value must be in one of the permitted formats (YYYY, YYYY-MM, YYYY-MM-DD).The
primary-languageattribute must be specified, and its value must be a valid language code. Here, "valid" means an IETF BCP 47 language tag, such asen,de-AT, etc.The
licenceattribute is optional. It is not examined.There must be at least one
subjectelement that draws on the ANZSRC-FOR vocabulary. Eachsubjectelement is validated as specified below.The
other-languageelement is optional; if specified, it is validated as specified below.The
poolparty-projectelement is optional; if specified, it is validated as specified below.The
top-conceptelement is optional; if specified, it is validated as specified below.The
related-entity-refelement is required; it is validated as specified below.There must be at least one valid
related-entity-refelement that includespublishedByas the value of one of itsrelationelements.The
related-vocabulary-refelement is optional; if specified, it is validated as specified below.The
versionelement is optional; if specified, it is validated as specified below.However, the
slugattributes of anyversionelements are collected and validated here. There are not allowed to be two versions with the same slug. For the purposes of this duplicate check, if anyversiondoes not have aslugattribute, a slug is generated from the value of thetitleattribute.
For each subject element (there may be more than one):
The
sourceattribute must be specified, and its value must be one of the permitted values. The permitted values are:anzsrc-for,anzsrc-seo,gcmd,local.The
labelattribute must be specified, and be non-empty.If the value of the
sourceattribute is such that an IRI must be specified (anzsrc-for,anzsrc-seo,gcmd), then theiriattribute must be specified, and be one of the IRIs for that source.If the
sourceattribute has the valuelocal, the value of thelabelattribute must not be a duplicate of the value of thelabelattribute of any othersubjectelement whosesourceattribute has the valuelocal.If the
sourceattribute does not have the valuelocal, the value of theiriattribute must not be a duplicate of the value of theiriattribute of any othersubjectelement whosesourceattribute has a value not equal tolocal.
For each other-language element (there may be more than one):
The value of the element must be a valid language code. Here, "valid" means an IETF BCP 47 language tag, such as
en,de-AT, etc.The value of the element must not be the same as the value of the
primary-languageattribute.The value of the element must not be the same as the value of any other
other-languageelement.
If a poolparty-project element is specified:
The
server-idattribute must be specified as the value1.The
project-idattribute must be specified, and be non-empty.
For each top-concept element (there may be more than one):
The value of the element must be non-empty.
The value of the element must not be the same as the value of any other
top-conceptelement.
For each related-entity-ref element (there may be more than one):
The
idattribute must be specified, and its value must correspond to that of a current related entity in the database.The value of the
idattribute must not match the value of theidattribute of any otherrelated-entity-refelement.There must be at least one
relationelement.For each
relationelement:The value of the element must be one of the values of the enumerated type. Validity is further contrained based on the type of the related entity.
If the related entity is a party, the value of the
relationelement may be any of:consumerOf,hasAuthor,hasContributor,implementedBy,pointOfContact,publishedBy.If the related entity is a service, the value of the
relationelement may be any of:hasAssociationWith,isUsedBy,isPresentedBy.If the related entity is a vocabulary, the value of the
relationelement may be any of:enriches,hasAssociationWith,isDerivedFrom,isPartOf.
The value of the element must not be the same as the value of any other
relationelement contained within thisrelated-entity-refelement.
For each related-vocabulary-ref element (there may be more than one):
The
idattribute must be specified, and its value must correspond to that of a current vocabulary in the database.If mode is
UPDATE– i.e., this vocabulary already exists, and the top-levelvocabularyelement has anidattribute – then the value of this element'sidattribute must not equal the value of the top-levelvocabularyelement'sidattribute. In other words, it is not permitted to create a self-reference. (Cycles in the chain of related vocabularies are permitted.)The value of the
idattribute must not match the value of theidattribute of any otherrelated-vocabulary-refelement.There must be at least one
relationelement.For each
relationelement:The value of the element must be one of the values of the enumerated type.
The value of the element must not be the same as the value of any other
relationelement contained within thisrelated-vocabulary-refelement.
Checked at the version level
For each version element (there may be more than one):
The
statusattribute must be specified. Its value must be any of the values of the enumerated type.The
titleattribute must be specified, and be non-empty.The
slugattribute is optional. If it is specified:The specified slug must be a valid slug. This means:
It must be non-empty.
It must have the format of a slug: i.e., when passed through the slug generator, the output exactly equals the input. (See the section "Slugs" at Vocabulary Registry data.)
The
noteattribute is optional. If specified, it must be a valid HTML fragment according to the validity rules specified in the section "Fields that contain HTML data" of Vocabulary Registry data. Its length must not be greater than 10000 characters.The
release-dateattribute is required, and its value must be in one of the permitted formats (YYYY, YYYY-MM, YYYY-MM-DD).The
do-poolparty-harvestattribute is optional. If specified, its value must be eithertrueorfalse, and the value may only betrueif the top-level vocabulary metadata contains apoolparty-projectelement.The
do-importattribute is optional. If specified, its value must be eithertrueorfalse, and the value may only betrueif theversionelement specifies something to import. For now, that means that at least one of these must be satisfied:the value of the
do-poolparty-harvestattribute istrue, orthere is at least one
access-pointelement, the value of whosediscriminatorattribute isfile.
The
do-publishattribute is optional. If specified, its value must be eithertrueorfalse, and the value may only betrueif the value of thedo-importattribute is alsotrue.The
versionelement must specify the creation of at least one access point. For now, that means that at least one of these must be satisfied:the value of the
do-poolparty-harvestattribute istrue, and the value of thedo-importattribute istrue, orthere is at least one
access-pointelement, the value of whosesourceattribute isuser.
The
versionelement may specify one or morebrowse-flagelements. If any are provided, they are validated individually, and together as a set of flags, as follows:Each value must be any one of the values of the enumerated type.
If the
defaultSortByNotationflag is specified, then themaySortByNotationflag must also be specified.If any of the
notation...flags (i.e.,notationAlpha,notationFloat, ornotationDotted) is specified, then themaySortByNotationflag must also be specified.If the
maySortByNotationflag is specified, then exactly one of the notation... flags (i.e.,notationAlpha,notationFloat, ornotationDotted) must be specified.
Checked at the access point level
For each access-point element (there may be more than one):
The
sourceattribute must be specified. Its value must be one of the values of the enumerated type.The
discriminatorattribute must be specified. Its value must be one of the values of the enumerated type.There must be an element corresponding to the value of the
discriminatorattribute. For example, if the value of thediscriminatorattribute isapiSparql, there must be anap-api-sparqlelement.There is additional type-specific validation:
If the value of the
discriminatorattribute isapiSparql:If the mode is
CREATE, the value of thesourceattribute must beuser.The
urlattribute must be specified, and its value must be a valid URL.
If the value of the
discriminatorattribute isfile:The value of the
sourceattribute must beuser.The
upload-idattribute must be specified, and its value must be an integer greater than zero. Additional authorization checks are applied by the API methods, including a check that the value of theupload-idattribute corresponds to an existing upload that is available to theownerspecified at the top-level vocabulary metadata.
If the value of the
discriminatorattribute issesameDownload:The mode must be
UPDATE, and the value of thesourceattribute must besystem.The
url-prefixattribute must be specified, and its value must be a valid URL.
If the value of the
discriminatorattribute ississvoc:If the mode is
CREATE, the value of thesourceattribute must beuser.The
url-prefixattribute must be specified, and its value must be a valid URL.
If the value of the
discriminatorattribute iswebPage:The
urlattribute must be specified, and its value must be a valid URL.
Validation of related entity metadata
The validation process attempts to check as many things as possible, and to collect a full list of constraint violations, rather than bailing out at a first violation. The various checks have a high level of independence, and there is minimal chance of a "cascade" of violations.
The following sections detail what is checked.
Checked at the related entity level
These are checked:
The
idattribute is checked.If mode is
CREATE, theidattribute must not be provided.If mode is
UPDATE, theidattribute must be provided.
The
typeattribute must be specified. Its value must be one of the values of the enumerated type.The
ownerattribute must be specified, and be non-empty. The API method will check that the caller of the method is authorized to specify the particular owner value.The
titleattribute must be specified, and be non-empty.The
emailattribute is optional. If it is specified:The specified value must be a valid email address. This means, it must have the format of an email address.
The
phoneattribute is optional. If it is specified, it must be non-empty.
Checked at the url level
The url child element is optional. For each one that is provided, this is checked:
The body of the element must be non-empty, and its value must have the form of a valid URL.
Checked at the related entity identifier level
The related-entity-identifier child element is optional. For each one that is provided, this is checked:
The
idattribute is checked.If mode is
CREATE, theidattribute must not be provided.
The
identifier-typeattribute must be specified. Its value must be one of the values of the enumerated type.The
identifier-valueattribute must be specified, and be non-empty. Its value must be consistent with the value ofidentifier-type.
There is an additional check that the same identifier has not been included more than once. For the purposes of this duplicate check, two identifiers are considered to be identical if they have both the same identifier-type and identifier-value.