Using Health Data Australia for Secondary Data Research
Secondary use of health and clinical trial data involves the use of existing datasets to answer questions beyond the scope of the original study. This approach maximises the value of research by enabling new discoveries, reducing duplication of effort, and supporting evidence-based decision-making without the time and cost of collecting new data.
Through Health Data Australia, researchers can discover and request access to a wide range of datasets from Australian health studies and clinical trials. To support use of secondary data, we have developed a theoretical framework for the use of clinical trials and other health data for secondary research purposes, which was derived from research papers, consultation with stakeholders and the research community.
ARDC built the Health Data Australia platform to support the sharing of health research data for these secondary research scenarios. These datasets can be used in various ways: to combine findings from multiple studies (evidence synthesis), explore new research questions (secondary analyses), verify or replicate existing findings (reproducibility, replication, validation), or to teach data analysis skills and develop new methods (education and methods development).
The Secondary Use of Clinical Trials Data in Health Research: A Practical Guide outlines these four scenarios and provides step-by-step guidance, case studies, and resources for each.
%20(1).jpg?inst-v=ae7ce72d-3170-4d75-a1d1-f4cff9708dd3)
Scenario 1: Evidence synthesis
Definition | Brings together evidence to answer a specific research question |
Key steps | Develop protocol, systematic search, study selection, data collection, appraisal, analysis, dissemination, update |
Data types | Aggregate data (i.e. data that have been summarised across participants), or raw line-by-line data, known as individual participant data |
Data sources | Journal publications, the Health Data Australia (HDA) platform, study investigators, clinical trials registers, data repositories |
Advantages | Comprehensive, systematic, minimise bias, rigorous |
Challenges | Obtaining data, time-consuming |
Types | Aggregate data meta-analysis, Individual participant data meta-analysis |
Time | Depends on research topic, number of included studies, methods used (type of meta-analysis), researcher experience |
Expertise | Information specialist, statistical support, administrative support, data management |
Evidence Synthesis Case Study
Sharing Secondary Data to Give Babies Born Too Early a Better Chance of Survival.
Scenario 2: Secondary analyses
Definition | Using existing datasets to answer new research questions |
Key steps | Develop protocol, obtain data, process and check data, conduct analysis, dissemination |
Data types | Aggregate data (i.e. data that have been summarised across participants), or raw line-by-line data, known as individual participant data (IPD) |
Data sources | The Health Data Australia (HDA) platform, trial investigators, clinical trials registers, journal websites, data repositories |
Advantages | Maximise use of pre-existing data at little additional cost Inform sample size calculations to assist planning of new trials |
Challenges | Obtaining data, generalisability/external validity |
Types | Descriptive analyses, identification of important prognostic or predictive factors of disease, better understanding of disease history, informing sample size or power calculations for new study, hypothesis generating research questions about associations, biomarkers, mediations, effectiveness |
Time | Depends on type and number of datasets |
Expertise | Statistical expertise/support, data management |
Secondary Analysis Case Study
Using existing trial datasets to determine the clinical accuracy of tumour marker blood test CA-125 versus CT imaging criteria to detect cancer progression in patients with ovarian cancer
Scenario 3: Reproducibility, replication and validation
Definition | Reproducibility: re-analysing data from an original study to verify its findings Replication: recreating an existing study to assess reliability of results Validation: attempts to recreate an effect estimate/ model in a new dataset |
Key steps | Define study objectives, develop protocol, obtain data and research materials, re-create study (primary replication), re-analyse data according to original methods, assess consistency across studies, report and disseminate findings |
Data types | Ideally individual participant data |
Data sources | The Health Data Australia (HDA) platform, study investigators, clinical trials registers, journal websites, data repositories |
Advantages | Fosters rigorous and transparent research practices, enhances confidence and credibility where research findings are found to be reproducible/replicable, enhances generalisability and external validity of findings. |
Challenges | Obtaining data and detailed methodology, including analysis plan and coding Having sufficient resources and expertise to conduct study |
Types | Reproducing statistical analyses, comparison of two or more studies (secondary replication), comparison of original study to a new replicated study (primary replication) |
Time | Depends on complexity of original study, and whether replication is primary (more time consuming) or secondary |
Expertise | Statistical expertise, topic experts, laboratory, or field staff for primary replication |
Real-world examples of reproducibility, replication and validation with secondary data
Research question | If an existing dataset is re-analysed using the same methods and code as the original study, are the results obtained consistent? |
Required data | IPD and detailed information about the methods of analysis used in the original study. |
Research question | Do two or more studies addressing the same research question obtain consistent results? |
Required data | Ideally IPD from an existing study that the researcher wishes to replicate, though aggregate data may be sufficient in some cases. The data for the new replication study can either be accessed through secondary data sources if a suitable dataset is available (i.e. data catalogue), or it can be ‘de novo’ conducted by the researcher. |
Scenario 4: Education and Methods Development
Definition | Using existing datasets to: teach/learn about data cleaning and analysis methods, or develop and demonstrate new statistical methods |
Key steps | Determine primary education, or methodological objectives, obtain appropriate dataset in adherence with regulations, use dataset for intended purpose |
Data types | Dependant on learning outcomes, but usually individual participant data are required, particularly to allow greater depth of learning |
Data sources | The Health Data Australia (HDA) platform, study investigators, clinical trials registers, journal websites, data repositories |
Advantages | Improves data cleaning and analysis capacity among students and researchers, which supports confidence in findings, and facilitates data re-use for other purposes. Enables methodological developments |
Challenges | Obtaining suitable data and navigating regulatory requirements to enable re-use for education and methods development purposes |
Types | Education about data processing, cleaning, coding, analysis, including learning new software and tools for these purposes. Develop and demonstrate new statistical methods |
Time | Can be adjusted to cater for student/teacher/research capacity, expertise, specific requirements, and key objectives |
Expertise | The teacher/trainer/methodologist should be skilled in the methods they are disseminating and/or developing, and possess good communication skills. The learner may have any level of expertise, from novice to expert |
Education and Methods development - Case Study
Sharing secondary data to develop, test, and demonstrate new statistical methods
Resources
Secondary Use of Clinical Trials Data in Health Research: A Practical Guide (PDF)
Unlock the potential of secondary data from clinical trials with Health Data Australia (Webinar)