Beyond RIF-CS: Metadata for Services Good Practice Guide
Introduction
Data services in the research domain support the use of research collections and datasets by providing automated functions for the creation, access, processing and analysis of data. More and more data providers are publishing their data through services. In Australia, for example, research organisations, science agencies, government departments and a number of national research infrastructure facilities are all moving to more formal publishing of data through services. Also, data consumers are increasingly accessing data services and connecting them with other services or tools (e.g. virtual laboratories) for data analysis, processing and visualisation.
Context
In 2017-18, the Australian Research Data Commons (ARDC) convened a Data Services Interest Group around data service provision and consumption across the NCRIS facilities, science agencies, universities, and broader public sector. To improve discovery and use of data and related services across these organisations, the Interest Group agreed on some “end user” scenarios that the group aspired to support:
Individual researchers looking to “plug data into” their own tool or model using standard services
Virtual laboratories providing tools over data from various common data services
Third party innovators leveraging data across a pool of services for development work.
The Interest Group set out to identify “shared practice” for exposing information about data and related services across all organisations by asking:
What information set would a data steward need to possess to satisfy the requests in those scenarios?
In current information technology practice where might such information typically be stored and exposed in data management systems?
Core metadata for services and related collections
The Interest Group looked at Data and related Service metadata terms from a number of metadata schemas, and attempted to group these terms by concept in Data and Data-Service Metadata Concepts and Schemas (Google Sheet): each row is a group of terms from various schemes; the groups are named in column B, ‘Concepts - for data and related services’.
Based on this, the Interest Group agreed upon a core information set for “data and related services” which might then be encoded in a given standard and exchanged using the corresponding protocol. The set has been tested against common OGC/W3C/OpenAPI/Web-index standards to make sure it works within a given metadata-protocol combination (eg ISO 19115 and CSW). However, it does not prescribe a particular metadata scheme, exchange protocol, or information management approach. Further detailed information on this approach is available in the document, Data and related Services: discovery and use (Google Document).
The agreed core set of information for data and related services follows. Future work for the Interest Group includes developing community standards for the encoding of values for these concepts:
(Essential = information required to respond to the three end-user scenarios listed above; more details here; Recommended = desirable information for discovery, appraisal, citation, re-use, etc)
Concepts, for data-services and related data
Requirement
service URL
service identifier (if different from the URL)
Essential *
service type: protocol and version - e.g. ‘wms 1.3’
service-use documentation (if protocol is non-standard - e.g. URL to service description)
service type: function (if protocol is non-standard - e.g. ‘download’)
Collection/relatedInfo[@type=’service’]/{title and or notes}
service type: protocol and version - e.g. ‘wms 1.3’service-use documentation (if protocol is non-standard - e.g. URL to service description)service type: function (if protocol is non-standard - e.g. ‘download’)
Service
Collection/relatedInfo[@type=’service’]
service type: resource type (e.g ‘service’)
Service/subject ORService/relatedInfo[@type=’collection’]/{title OR notes}
Collection/subject
data subject
Service/name
Collection/relatedInfo[@type=’service’]/{title and or notes}
service title
Service/coverage/spatial
Collection/location/spatial
data spatial coverage
Service/coverage/spatial
Collection/location/spatial
data geographic/projected CRS
Service/coverage/temporal
Collection/coverage/temporal
data temporal coverage
Service/description
Collection/relatedInfo[@type=’service’]/notes
service description/ abstract
Service/relatedInfo/@type=’collection’/{title OR notes}
Collection/relatedInfo[@type=’service’]/format
data format
Service/@dateModified
Collection/relatedInfo[@type=’service’]/notes
service date (modified)
Service/rights
Collection/relatedInfo[@type=’service’]/notes
service rights
Service/relatedInfo/@type=’collection’/{title OR notes}
Collection/rights
data rights
Service/relatedInfo/@type=’collection’/{title OR notes}
Service/relatedInfo/@type=’collection’/{title OR notes}
Collection/name/@lang
data language
Service/name/@lang
Collection/relatedInfo[@type=’service’]/notes
service language
Service/relatedInfo
Collection/identifierCollection/name
data identifying information - its text name, or an identifier such as a uuid or doi to a landing page
Service/relatedObject/key
Collection/relatedInfo[@type=’service’]/notes
service contributor/owner/publisher
DCAT
Following analysis and discussion in the W3C Dataset Exchange Working Group (DXWG), a proposed solution for cataloguing services in the context of a DCAT catalog has been developed – see the DCAT-2 Working Draft and Editors Draft. Classes for DataService and DataDistributionService have been added to the DCAT vocabulary. Examples of their use are shown here and here. Note that the second example is modelled on an instance from Research Data Australia. A summary of the solution is below:
DCAT
Concepts, for data-services and related data
Comment
dct:title
service title
dct:description
service description/ abstract
dcat:endpointDescription
service description/ abstract
Link to machine-readable endpointDescription, such as a Swagger or GetCapabilities document
rdf:type
service type: resource type
dct:conformsTo
service type: protocol and version
In DCAT a single service might support multiple interfaces or protocols, so this property may be repeated
dct:type
service type: function
In DCAT a service might have multiple classifiers, either at different levels of refinement or with values taken from different controlled vocabularies, so this property may be repeated
dcat:endpointURL
service URL
service API
dcat:landingPage
service URL
Human useable landing-page for the service (which might be provided at a different URL to the service API)
dcat:servesDataset
data subject, data spatial coverage, data geographic/projected CRS, data temporal coverage, data format, data rights, data contributor/owner/publisher, data language, data identifying information
In DCAT this is a link to a description of the dataset(s) served, which is packaged as a separate record. All of the dataset descriptors are provided as properties of that resource, and not as properties of the service itself.
Examples of service description using the agreed core concepts
There is no assumption that data provision organisations necessarily maintain independent metadata descriptions of services; there is however a shared expectation that a core set of information about data and related services be available from somewhere in the data management system. These might include: references to services within dataset records; or “self-describing” interfaces such as GetCapabilities; or combinations of both (see Data and related Services - Metadata Views for more information). The image below demonstrates how information about the three core concepts: "service title", "service creator" and "data subject" could be obtained from various locations in the data management system.
Following are three real-life examples from some of the data provision organisations we have been working with:
Example 1 - Service metadata provided in a dedicated service record (Geoscience Australia)