Beyond RIF-CS: Metadata for Services Good Practice Guide

Introduction

Data services in the research domain support the use of research collections and datasets by providing automated functions for the creation, access, processing and analysis of data. More and more data providers are publishing their data through services. In Australia, for example, research organisations, science agencies, government departments and a number of national research infrastructure facilities are all moving to more formal publishing of data through services. Also, data consumers are increasingly accessing data services and connecting them with other services or tools (e.g. virtual laboratories) for data analysis, processing and visualisation.

Context

In 2017-18, the Australian Research Data Commons (ARDC) convened a Data Services Interest Group around data service provision and consumption across the NCRIS facilities, science agencies, universities, and broader public sector. To improve discovery and use of data and related services across these organisations, the Interest Group agreed on some “end user” scenarios that the group aspired to support:

Individual researchers looking to “plug data into” their own tool or model using standard services
Virtual laboratories providing tools over data from various common data services
Third party innovators leveraging data across a pool of services for development work.

The Interest Group set out to identify “shared practice” for exposing information about data and related services across all organisations by asking:

What information set would a data steward need to possess to satisfy the requests in those scenarios?
In current information technology practice where might such information typically be stored and exposed in data management systems?

Core metadata for services and related collections

The Interest Group looked at Data and related Service metadata terms from a number of metadata schemas, and attempted to group these terms by concept in Data and Data-Service Metadata Concepts and Schemas (Google Sheet): each row is a group of terms from various schemes; the groups are named in column B, ‘Concepts - for data and related services’.

Based on this, the Interest Group agreed upon a core information set for “data and related services” which might then be encoded in a given standard and exchanged using the corresponding protocol. The set has been tested against common OGC/W3C/OpenAPI/Web-index standards to make sure it works within a given metadata-protocol combination (eg ISO 19115 and CSW). However, it does not prescribe a particular metadata scheme, exchange protocol, or information management approach. Further detailed information on this approach is available in the document, Data and related Services: discovery and use (Google Document).

The agreed core set of information for data and related services follows. Future work for the Interest Group includes developing community standards for the encoding of values for these concepts:

(Essential = information required to respond to the three end-user scenarios listed above; more details here;
Recommended = desirable information for discovery, appraisal, citation, re-use, etc)

Concepts, for data-services and related data	Requirement
service URL service identifier (if different from the URL)	Essential *
service type: protocol and version - e.g. ‘wms 1.3’ service-use documentation (if protocol is non-standard - e.g. URL to service description) service type: function (if protocol is non-standard - e.g. ‘download’)	Essential*
service type: resource type (e.g. ‘service’)	Essential
data subject (e.g. 'observedProperty', 'variableMeasured')	Essential
service title	Essential
data spatial coverage	Essential if available
data geographic/projected CRS	Essential if available
data temporal coverage	Essential if available
service description/ abstract	Recommended
data format	Recommended
service date (modified)	Recommended
service rights	Recommended
data rights	Recommended
data contributor/owner/publisher	Recommended
data language	Recommended
service language	Recommended
data identifying information - its text name, or an identifier such as a uuid or doi to a landing page	Recommended
service contributor/owner/publisher	Recommended

* = essential for a minimum response

Community standards for the mapping of these concepts (in development):

RIF-CS

Service Record RIF-CS Xpath	Collection Record RIF-CS Xpath	Concepts, for data-services and related data
Service/identifier	Collection/relatedInfo/identifierAND ORCollection/relatedInfo/relation/url (service URL to this dataset)	service URL service identifier (if different from the URL)
Service/@type Service/relatedInfo/@type=’reuseInformation’/identifier[@type=’url’]	Collection/relatedInfo[@type=’service’]/{title and or notes}	service type: protocol and version - e.g. ‘wms 1.3’service-use documentation (if protocol is non-standard - e.g. URL to service description)service type: function (if protocol is non-standard - e.g. ‘download’)
Service	Collection/relatedInfo[@type=’service’]	service type: resource type (e.g ‘service’)
Service/subject ORService/relatedInfo[@type=’collection’]/{title OR notes}	Collection/subject	data subject
Service/name	Collection/relatedInfo[@type=’service’]/{title and or notes}	service title
Service/coverage/spatial	Collection/location/spatial	data spatial coverage
Service/coverage/spatial	Collection/location/spatial	data geographic/projected CRS
Service/coverage/temporal	Collection/coverage/temporal	data temporal coverage
Service/description	Collection/relatedInfo[@type=’service’]/notes	service description/ abstract
Service/relatedInfo/@type=’collection’/{title OR notes}	Collection/relatedInfo[@type=’service’]/format	data format
Service/@dateModified	Collection/relatedInfo[@type=’service’]/notes	service date (modified)
Service/rights	Collection/relatedInfo[@type=’service’]/notes	service rights
Service/relatedInfo/@type=’collection’/{title OR notes}	Collection/rights	data rights
Service/relatedInfo/@type=’collection’/{title OR notes}	Collection/relatedObject/keyORCollection/relatedInformation[@type=’party’]	data contributor/owner/publisher
Service/relatedInfo/@type=’collection’/{title OR notes}	Collection/name/@lang	data language
Service/name/@lang	Collection/relatedInfo[@type=’service’]/notes	service language
Service/relatedInfo	Collection/identifierCollection/name	data identifying information - its text name, or an identifier such as a uuid or doi to a landing page
Service/relatedObject/key	Collection/relatedInfo[@type=’service’]/notes	service contributor/owner/publisher

DCAT

Following analysis and discussion in the W3C Dataset Exchange Working Group (DXWG), a proposed solution for cataloguing services in the context of a DCAT catalog has been developed – see the DCAT-2 Working Draft and Editors Draft. Classes for DataService and DataDistributionService have been added to the DCAT vocabulary. Examples of their use are shown here and here. Note that the second example is modelled on an instance from Research Data Australia. A summary of the solution is below:

DCAT	Concepts, for data-services and related data	Comment
dct:title	service title
dct:description	service description/ abstract
dcat:endpointDescription	service description/ abstract	Link to machine-readable endpointDescription, such as a Swagger or GetCapabilities document
rdf:type	service type: resource type
dct:conformsTo	service type: protocol and version	In DCAT a single service might support multiple interfaces or protocols, so this property may be repeated
dct:type	service type: function	In DCAT a service might have multiple classifiers, either at different levels of refinement or with values taken from different controlled vocabularies, so this property may be repeated
dcat:endpointURL	service URL	service API
dcat:landingPage	service URL	Human useable landing-page for the service (which might be provided at a different URL to the service API)
dcat:servesDataset	data subject, data spatial coverage, data geographic/projected CRS, data temporal coverage, data format, data rights, data contributor/owner/publisher, data language, data identifying information	In DCAT this is a link to a description of the dataset(s) served, which is packaged as a separate record. All of the dataset descriptors are provided as properties of that resource, and not as properties of the service itself.
dct:language	service language
dct:accessRights	service rights
dct:modified	service date (modified)
dct:creator\|publisher\|contributor prov:wasAttributedTo	service contributor/owner/publisher

Examples of service description using the agreed core concepts

There is no assumption that data provision organisations necessarily maintain independent metadata descriptions of services; there is however a shared expectation that a core set of information about data and related services be available from somewhere in the data management system. These might include: references to services within dataset records; or “self-describing” interfaces such as GetCapabilities; or combinations of both (see Data and related Services - Metadata Views for more information). The image below demonstrates how information about the three core concepts: "service title", "service creator" and "data subject" could be obtained from various locations in the data management system.

Following are three real-life examples from some of the data provision organisations we have been working with:

Example 1 - Service metadata provided in a dedicated service record (Geoscience Australia)

Service record from the GA metadata catalogue:

Service metadata extracted from the above and mapped to the core service concepts:

Service concept	Value
service URL	http://services.ga.gov.au/gis/services/DEM_SRTM_1Second_Slope/MapServer/WMSServer
service type: protocol and version	Protocol: WMS 1.3.0, 1.1.1
service type: resource type	service
data subject	Land topography models, Ecology landscape, elevation, slope
service title	Digital Elevation Model (DEM) of Australia derived from SRTM with 1 Second Grid - Smoothed Percentage Slope WMS
data spatial coverage	["112.000000 -44.000000,154.000000 -44.000000,154.000000 -9.000000,112.000000 -9.000000,112.000000 -44.000000"]
data geographic/projected CRS
data temporal coverage
service description/ abstract	Digital Elevation Model (DEM) of Australia derived from SRTM with 1 Second Grid - Smoothed Percentage Slope WMS
data format
service date (modified)	2018-06-18
service rights
data rights
data contributor/owner/publisher	Geoscience Australia
data language
service language
data identifying information	UUID: aac46307-fce8-449d-e044-00144fdd4fa6
service contributor/owner/publisher	Geoscience Australia

Example 2 - Service metadata from within a dataset record combined with the response from a self-describing service (AODN)

IMAS UTAS dataset record in the AODN portal:

Hyperlink to Service URL within dataset record:

Service description at Service URL (OGC WFS) extracted from dataset metadata XML:

Summation of service metadata extracted from the above and mapped to the core service concepts:

Service concept	Value
service URL	http://geoserver.imas.utas.edu.au/geoserver/seamap/ows?service=WFS&version=1.0.0&request=GetFeature&typeName=seamap:SeamapAus_NSW_marine_habitats_2002&outputFormat=SHAPE-ZIP
service type: protocol and version service type: function	Protocol: WFSFunction: Access
service type: resource type	service
data subject	bathymetry/seafloor topography...
service title	seamap
data spatial coverage	-27.64400, 149.28540, 154.29516, -37.60255
data geographic/projected CRS
data temporal coverage	2002-05-30
service description/ abstract
data format	SHAPE-ZIP
service date (modified)
service rights
data rights	CC-BY 4.0
data contributor/owner/publisher	IMAS UTAS
data language	English
service language	English
data identifying information	ID: 9a94d1ba-8509-4d78-8b55-d25fd222cdffName: MAP - NSW marine habitats
service contributor/owner/publisher	IMAS UTAS

Example 3 - Service metadata from self-describing service OpenAPI (Atlas of Living Australia)

Service OpenAPI URL - description for multiple service endpoints provided in json:

Summation of service metadata extracted from the above and mapped to the core service concepts:

Service concept	Value
service URL	http://biocache-ws.ala.org.au/ws/occurrences/search*
service type: protocol and version	webservice search Parameters: fq - array(False) formattedFq - array(False) facets - array(False) formattedQuery - string(False) q - string(False) ...
service type: resource type	service
data subject	occurrence
service title	occurrenceSearchUsingGET
data spatial coverage
data geographic/projected CRS
data temporal coverage
service description/ abstract	occurrenceSearchUsingGET operation available at biocache-service API
data format	json
service date (modified)
service rights
data rights
data contributor/owner/publisher
data language
service language
data identifying information
service contributor/owner/publisher	Atlas of Living Australia