Information for potential Health Data Australia participants

Currently, Health Data Australia contains information about clinical trial datasets for discovery and request. Future expansion of Health Data Australia’s content is planned, and will include data collections from cohort studies, government and health service data. The following information is provided to assist organisations who may wish to participate in this future expansion.

About Health Data Australia

Health data can be siloed across health services, research institutes, facilities, and jurisdictions, making it difficult for researchers to find the data relevant to their research. Health Data Australia seeks to be part of the solution to this challenge by assembling a national catalogue of Australian health data for researchers. Health Data Australia operates under the National Collaborative Research Infrastructure Strategy funded through the Australian Government to support leading edge research.

Currently, Health Data Australia contains information about clinical trials datasets collected from over 70 Australian organisations (universities, medical research institutes, clinical trials networks, and health services). It also contains contextual information to better understand these datasets, such as study protocols and data dictionaries. The data is held and managed by the organisations responsible for the clinical trials, and users can submit a data request to the data owner.

Health Data Australia is expanding to hold a wider array of data collections such as cohort studies, government, and health service data.

How Health Data Australia works

Health data holders share descriptions of their data, which are then made discoverable in the Health Data Australia portal. Health Data Australia only contains metadata records for the data; it does not store the data itself. Contributors retain control over access to the data, however there is an expectation that the data can be made available to researchers (provided appropriate conditions are met).

If you hold and/or manage health data assets that can be used for health research, then we invite you to consider participating in this program, which will increase the visibility of your data to researchers and connect it to other health datasets across the health system. Health Data Australia is highly indexed in global search services and is developing syndication arrangements with international partners such as Health Data Research UK.

How to participate

To participate, organisations need to maintain descriptions of their data collections and be able to keep this information current as new data becomes available. We expect at minimum a core set of descriptive elements (defined below) that are relevant to researchers seeking data for health research. Health Data Australia can automatically transfer, or harvest, that information from your organisation and regularly check for updates. There are a number of automated processes to make sure we are always in sync with you and that researchers get current information.

Figure 1. Information flow from Participating Organisations to Health Data Australia

For sensitive data we provide a data request service which collects data request information for your data custodians to consider. We can investigate integration with your existing request service if you have one. There is an expectation that your organisation has governance processes in place that would allow you to respond to such a request.

Figure 2. Data sharing request

Since we collect information on datasets from all over the research and health system, we require a global dataset identifier (DOI) to uniquely identify your data. ARDC provides free services and support for allocating DOIs to your data - which has the added value of helping track subsequent references to your data in the scholarly publication system.

Core Information Set

Health Data Australia displays information about your dataset that helps researchers decide if the data is relevant to their research. This information set has been refined by a decade of surveys, interviews, user testing, and log analysis. Entries in Health Data Australia are expected to contain the following information:

The title or name of the dataset
A summary description of the dataset
The name and/or identifier of the organisation publishing the metadata record
Subject terms or keywords that describe the subject matter of the dataset
A unique identifier for the dataset, i.e. DOI
Dataset rights, including information on how to access the dataset (via URL, download, service, data request, etc) and any conditions for use
A point of contact for any enquiries in relation to the dataset
For more information, see the HeSANDA metadata profile/schema.

Additional information about a dataset is also useful and may include:

The format that the data is made available in
The time period in which the data was collected or observations were made
The dataset scope/coverage, which may include demographic characteristics of the population or sample; inclusion or exclusion criteria; known gaps in the collection
A description of the activity that generated the dataset, e.g. a project or study
Related resources, which may include publications about the dataset; data dictionaries; data quality statements; etc
The name and/or identifier of people and organisations that have a role in collecting, owning, producing, publishing or otherwise making available, the dataset

Our expectations for metadata align with national and international standards, including: ONDC Metadata Attributes, DataCite Metadata Schema, DDI Codebook, and Schema.org (Dataset). It comprises a natural common core to enable discovery of health data across the various sectors involved in its generation (government administration, health service delivery, research, industry, etc). Any well accepted, community-endorsed metadata schema can potentially be supported through a crosswalk to our metadata schema.

Transfer Protocol

Researchers and custodians expect up-to-date information, so we expect an automatable process for harvesting your metadata and any updates over time. To enable this, you need to make your data descriptions programmatically transferable from your own system, for example:

A data catalogue with an API that allows metadata retrieval, e.g. CKAN, Figshare, Elsevier PURE, Dataverse (DDI); or
A web site/page that supports Schema.org/JSON-LD.

For large research communities we also support custom pathways - for example for clinical trials data, the ARDC nightly harvests information from the ANZ Clinical Trials Registry and DataCite to compile descriptions of clinical trials data. ARDC provides support for establishing such dynamic update processes.

Data discovery and re-use depends on many other factors including consent, ethics, policy, culture, scientific methods, standards, and skills. ARDC provides “community of practice” support and materials in all these areas as part of a holistic infrastructure approach.

Contact

If you are interested in exploring your participation in Health Data Australia further, please email: contact@ardc.edu.au and we will be in touch.