Skip to main content
Skip table of contents

Using Health Data Australia for Secondary Data Research

Secondary use of health and clinical trial data involves the use of existing datasets to answer questions beyond the scope of the original study. This approach maximises the value of research by enabling new discoveries, reducing duplication of effort, and supporting evidence-based decision-making without the time and cost of collecting new data.

Through Health Data Australia, researchers can discover and request access to a wide range of datasets from Australian health studies and clinical trials. To support use of secondary data, we have developed a theoretical framework for the use of clinical trials and other health data for secondary research purposes, which was derived from research papers, consultation with stakeholders and the research community.

ARDC built the Health Data Australia platform to support the sharing of health research data for these secondary research scenarios. These datasets can be used in various ways: to combine findings from multiple studies (evidence synthesis), explore new research questions (secondary analyses), verify or replicate existing findings (reproducibility, replication, validation), or to teach data analysis skills and develop new methods (education and methods development).

The Secondary Use of Clinical Trials Data in Health Research: A Practical Guide outlines these four scenarios and provides step-by-step guidance, case studies, and resources for each.  

Science Scenarios (1) (1).jpg

Scenario 1: Evidence synthesis

Definition

Brings together evidence to answer a specific research question

Key steps

Develop protocol, systematic search, study selection, data collection, appraisal, analysis, dissemination, update

Data types

Aggregate data (i.e. data that have been summarised across participants), or raw line-by-line data, known as individual participant data

Data sources

Journal publications, the Health Data Australia (HDA) platform, study investigators, clinical trials registers, data repositories

Advantages

Comprehensive, systematic, minimise bias, rigorous

Challenges

Obtaining data, time-consuming

Types

Aggregate data meta-analysis, Individual participant data meta-analysis

Time

Depends on research topic, number of included studies, methods used (type of meta-analysis), researcher experience

Expertise

Information specialist, statistical support, administrative support, data management

Evidence Synthesis Case Study

Sharing Secondary Data to Give Babies Born Too Early a Better Chance of Survival.

Scenario 2: Secondary analyses

Definition

Using existing datasets to answer new research questions

Key steps

Develop protocol, obtain data, process and check data, conduct analysis, dissemination

Data types

Aggregate data (i.e. data that have been summarised across participants), or raw line-by-line data, known as individual participant data (IPD)

Data sources

The Health Data Australia (HDA) platform, trial investigators, clinical trials registers, journal websites, data repositories

Advantages

Maximise use of pre-existing data at little additional cost

Inform sample size calculations to assist planning of new trials

Challenges

Obtaining data, generalisability/external validity

Types

Descriptive analyses, identification of important prognostic or predictive factors of disease, better understanding of disease history, informing sample size or power calculations for new study, hypothesis generating research questions about associations, biomarkers, mediations, effectiveness

Time

Depends on type and number of datasets

Expertise

Statistical expertise/support, data management

Secondary Analysis Case Study

Using existing trial datasets to determine the clinical accuracy of tumour marker blood test CA-125 versus CT imaging criteria to detect cancer progression in patients with ovarian cancer

Scenario 3: Reproducibility, replication and validation

Definition

Reproducibility: re-analysing data from an original study to verify its findings

Replication: recreating an existing study to assess reliability of results

Validation: attempts to recreate an effect estimate/ model in a new dataset

Key steps

Define study objectives, develop protocol, obtain data and research materials, re-create study (primary replication), re-analyse data according to original methods, assess consistency across studies, report and disseminate findings

Data types

Ideally individual participant data

Data sources

The Health Data Australia (HDA) platform, study investigators, clinical trials registers, journal websites, data repositories

Advantages

Fosters rigorous and transparent research practices, enhances confidence and credibility where research findings are found to be reproducible/replicable, enhances generalisability and external validity of findings.

Challenges

Obtaining data and detailed methodology, including analysis plan and coding

Having sufficient resources and expertise to conduct study

Types

Reproducing statistical analyses, comparison of two or more studies (secondary replication), comparison of original study to a new replicated study (primary replication)

Time

Depends on complexity of original study, and whether replication is primary (more time consuming) or secondary

Expertise

Statistical expertise, topic experts, laboratory, or field staff for primary replication

Real-world examples of reproducibility, replication and validation with secondary data

Research question

If an existing dataset is re-analysed using the same methods and code as the original study, are the results obtained consistent?

Required data 

IPD and detailed information about the methods of analysis used in the original study.

Research question

Do two or more studies addressing the same research question obtain consistent results?

Required data

Ideally IPD from an existing study that the researcher wishes to replicate, though aggregate data may be sufficient in some cases. The data for the new replication study can either be accessed through secondary data sources if a suitable dataset is available (i.e. data catalogue), or it can be ‘de novo’ conducted by the researcher.

Scenario 4: Education and Methods Development

Definition

Using existing datasets to: teach/learn about data cleaning and analysis methods, or develop and demonstrate new statistical methods

Key steps

Determine primary education, or methodological objectives, obtain appropriate dataset in adherence with regulations, use dataset for intended purpose

Data types

Dependant on learning outcomes, but usually individual participant data are required, particularly to allow greater depth of learning

Data sources

The Health Data Australia (HDA) platform, study investigators, clinical trials registers, journal websites, data repositories

Advantages

Improves data cleaning and analysis capacity among students and researchers, which supports confidence in findings, and facilitates data re-use for other purposes. Enables methodological developments

Challenges

Obtaining suitable data and navigating regulatory requirements to enable re-use for education and methods development purposes

Types

Education about data processing, cleaning, coding, analysis, including learning new software and tools for these purposes. Develop and demonstrate new statistical methods

Time

Can be adjusted to cater for student/teacher/research capacity, expertise, specific requirements, and key objectives

Expertise

The teacher/trainer/methodologist should be skilled in the methods they are disseminating and/or developing, and possess good communication skills. The learner may have any level of expertise, from novice to expert

Education and Methods development - Case Study

Sharing secondary data to develop, test, and demonstrate new statistical methods

Resources

JavaScript errors detected

Please note, these errors can depend on your browser setup.

If this problem persists, please contact our support.