Using Health Data Australia for Secondary Data Research

Secondary use of health and clinical trial data involves the use of existing datasets to answer questions beyond the scope of the original study. This approach maximises the value of research by enabling new discoveries, reducing duplication of effort, and supporting evidence-based decision-making without the time and cost of collecting new data.

Through Health Data Australia, researchers can discover and request access to a wide range of datasets from Australian health studies and clinical trials. To support use of secondary data, we have developed a theoretical framework for the use of clinical trials and other health data for secondary research purposes, which was derived from research papers, consultation with stakeholders and the research community.

ARDC built the Health Data Australia platform to support the sharing of health research data for these secondary research scenarios. These datasets can be used in various ways: to combine findings from multiple studies (evidence synthesis), explore new research questions (secondary analyses), verify or replicate existing findings (reproducibility, replication, validation), or to teach data analysis skills and develop new methods (education and methods development).

The Secondary Use of Clinical Trials Data in Health Research: A Practical Guide outlines these four scenarios and provides step-by-step guidance, case studies, and resources for each.

Scenario 1: Evidence synthesis

Definition	Brings together evidence to answer a specific research question
Key steps	Develop protocol, systematic search, study selection, data collection, appraisal, analysis, dissemination, update
Data types	Aggregate data (i.e. data that have been summarised across participants), or raw line-by-line data, known as individual participant data
Data sources	Journal publications, the Health Data Australia (HDA) platform, study investigators, clinical trials registers, data repositories
Advantages	Comprehensive, systematic, minimise bias, rigorous
Challenges	Obtaining data, time-consuming
Types	Aggregate data meta-analysis, Individual participant data meta-analysis
Time	Depends on research topic, number of included studies, methods used (type of meta-analysis), researcher experience
Expertise	Information specialist, statistical support, administrative support, data management

Evidence Synthesis Case Study

Sharing Secondary Data to Give Babies Born Too Early a Better Chance of Survival.

Scenario 2: Secondary analyses

Definition	Using existing datasets to answer new research questions
Key steps	Develop protocol, obtain data, process and check data, conduct analysis, dissemination
Data types	Aggregate data (i.e. data that have been summarised across participants), or raw line-by-line data, known as individual participant data (IPD)
Data sources	The Health Data Australia (HDA) platform, trial investigators, clinical trials registers, journal websites, data repositories
Advantages	Maximise use of pre-existing data at little additional cost Inform sample size calculations to assist planning of new trials
Challenges	Obtaining data, generalisability/external validity
Types	Descriptive analyses, identification of important prognostic or predictive factors of disease, better understanding of disease history, informing sample size or power calculations for new study, hypothesis generating research questions about associations, biomarkers, mediations, effectiveness
Time	Depends on type and number of datasets
Expertise	Statistical expertise/support, data management

Secondary Analysis Case Study

Using existing trial datasets to determine the clinical accuracy of tumour marker blood test CA-125 versus CT imaging criteria to detect cancer progression in patients with ovarian cancer

Scenario 3: Reproducibility, replication and validation

Definition	Reproducibility: re-analysing data from an original study to verify its findings Replication: recreating an existing study to assess reliability of results Validation: attempts to recreate an effect estimate/ model in a new dataset
Key steps	Define study objectives, develop protocol, obtain data and research materials, re-create study (primary replication), re-analyse data according to original methods, assess consistency across studies, report and disseminate findings
Data types	Ideally individual participant data
Data sources	The Health Data Australia (HDA) platform, study investigators, clinical trials registers, journal websites, data repositories
Advantages	Fosters rigorous and transparent research practices, enhances confidence and credibility where research findings are found to be reproducible/replicable, enhances generalisability and external validity of findings.
Challenges	Obtaining data and detailed methodology, including analysis plan and coding Having sufficient resources and expertise to conduct study
Types	Reproducing statistical analyses, comparison of two or more studies (secondary replication), comparison of original study to a new replicated study (primary replication)
Time	Depends on complexity of original study, and whether replication is primary (more time consuming) or secondary
Expertise	Statistical expertise, topic experts, laboratory, or field staff for primary replication

Real-world examples of reproducibility, replication and validation with secondary data

Research question	If an existing dataset is re-analysed using the same methods and code as the original study, are the results obtained consistent?
Required data	IPD and detailed information about the methods of analysis used in the original study.
Research question	Do two or more studies addressing the same research question obtain consistent results?
Required data	Ideally IPD from an existing study that the researcher wishes to replicate, though aggregate data may be sufficient in some cases. The data for the new replication study can either be accessed through secondary data sources if a suitable dataset is available (i.e. data catalogue), or it can be ‘de novo’ conducted by the researcher.

Scenario 4: Education and Methods Development

Definition	Using existing datasets to: teach/learn about data cleaning and analysis methods, or develop and demonstrate new statistical methods
Key steps	Determine primary education, or methodological objectives, obtain appropriate dataset in adherence with regulations, use dataset for intended purpose
Data types	Dependant on learning outcomes, but usually individual participant data are required, particularly to allow greater depth of learning
Data sources	The Health Data Australia (HDA) platform, study investigators, clinical trials registers, journal websites, data repositories
Advantages	Improves data cleaning and analysis capacity among students and researchers, which supports confidence in findings, and facilitates data re-use for other purposes. Enables methodological developments
Challenges	Obtaining suitable data and navigating regulatory requirements to enable re-use for education and methods development purposes
Types	Education about data processing, cleaning, coding, analysis, including learning new software and tools for these purposes. Develop and demonstrate new statistical methods
Time	Can be adjusted to cater for student/teacher/research capacity, expertise, specific requirements, and key objectives
Expertise	The teacher/trainer/methodologist should be skilled in the methods they are disseminating and/or developing, and possess good communication skills. The learner may have any level of expertise, from novice to expert