Skip to main content
Skip table of contents

Research Link Australia Data Curation Principles

Preamble

The Research Link Australia (RLA) Minimum Viable Product (MVP) (as released on 21 May 2024) utilises information from openly available data sources such as ORCID, CrossRef, PubMed, ARC, etc in order to demonstrate its features in  the initial minimum viable product release. It is acknowledged that there are information quality issues that impact on the goal of RLA of supporting the identification of potential research-industry-government collaboration. While RLA was not intended to be a reporting tool providing a comprehensive catalog of researcher, grant, or publication finder, the information quality in the initial release  impedes user experience as evidenced through the first RLA usability test. 

In the product's future evolution, more data and data sources would be added to RLA. The RLA data curation principles in this document will guide  future incorporation  of data in RLA. The principles will be reviewed at  least  yearly to accommodate emerging opportunities and changes to research information domain standards.

Information model

Currently, RLA focuses on providing information on research and research collaboration capabilities. The RLA information model below shows how the two capabilities are linked. 

In particular: 

  • Individuals’ and organisations’ collaboration capabilities are demonstrated through their collaborative work from previously funded activities, which have both research and industry participation (e.g. ARC Linkage Projects). Currently, the funded activities are all recorded as occurring in the past, or currently active. Future opportunities are not included.

  • Individuals’ and organisations’ research capabilities are demonstrated through their research input and output, e.g. publications, datasets, instruments, and funded projects. Note that RLA doesn’t intend to show such a capability by providing a complete coverage of a researcher’s work or all research activities of an organisation, but enough recent work to indicate research capabilities. 

Each entity - such as researcher/expert, organisation, research input/output, and funded activity is represented by a data type or a data object. The RLA data model, also known as the RLA graph, links each entity or data type to another entity in the model. RLA will include an entity if the entity is linked to at least another entity, the principles provide details about which entities would be included or not. 

NB: Organisation includes both academic, research organisation and business; unless specified.

Principles

Principle 1: Trust and Quality

  • Transparently, we acknowledge that there are information quality issues that impact on the goal of RLA, and we will attempt to accurately inform people of quality issues and manage expectations and advise on appropriate use.

  • To increasingly support and enhance the identification of potential research-industry-government collaboration - we will prioritise quality information and information quality improvement (as lack of quality may also impede user experience, eg. many false positives), above creating a comprehensive (broader) catalog of researcher, grant, or publications.

  • Information from trusted sources is prioritised above self-reported information. 

Principle 2: Metadata and Sources

  • In general we obtain information/metadata from aggregate sources of information. In principle, RLA will avoid holding corrections, for example a separate researcher profile, we will recommend people correct information at the primary sources (universities, ORCID, funders etc.). However, RLA may decide to programmatically (or automatically) present corrections, or use one data source to present corrections (eg. common spelling errors, organisation name normalisation, grant information is more correct from the funder than in ORCID, perhaps).

  • In principle, RLA displays metadata/selected information from source (or sources). However, there will be cases where extra metadata needs to be annotated to a researcher or organisation, for example, Fields of Research (FoR), Industry Sector Classifications, identifier (e.g. ORCID, ABN, ACN, RoR, etc). These RLA annotated metadata would help to link entities to their works, as well as enable the pulling of or pointing to richer information about an organisation (e.g. ABS may have a description of a business).

    • There should be over a certain level of accuracy for adding annotated metadata

    • When such annotated metadata is displayed, state where the metadata are derived (e.g. LLM for FoR codes), and ideally offer a feedback mechanism for users to specify if an annotated metadata is incorrect (this information would help to improve models)

Principle 3: Duplicated Representation

  • Duplicated records due to name variation: Ideally, there should be one record per researcher, organisation, or grant/project. In case, there are multiple names of the same entity, these name variations should be mapped to the organisation name that is registered to RoR or ABN/ACN if such an identifier exists.

  • Duplicated records from different sources but about the same entity: Overtime, RLA will try to identify and group those duplicate records, however, RLA will not try to amend records from one to another.

Principle 4: Grants/Projects

RLA records data about Grants/Projects:

  • that demonstrate existing research-industry collaborations. Such a grant/project should have information about either researchers or participating organisations, and at least one organisation is a non-university or research organisation collaborator.

  • Prioritising trusted sources:  Information from the funding source (e.g. funder) is prioritised above information from other organisations and individuals.

  • A grant (or a project as well, no use case about project yet) view should point to both funder’s record (landing page) and landing page if exists from other sources, e.g.  university/researcher 

Principle 5: Researcher

RLA records data about a researcher:

RLA records data about a researcher:

  • Who is affiliated with an Australian organisation and

    • Who is associated with at least a grant/project (as described in Principle 4), a publication, or a patent 

    • Where good quality information (eg a profile) about the researcher could be obtained:

      • the researcher’s profile comes from an Australian university research management system, or the researcher’s orcid profile that covers minimum metadata (Australia affiliation, together with at least one project/grant, one publication or one patent)

  • International researchers who collaborate with an Australian researcher through a grant/project as described in Principle 4 or publication/grant as described in Principle 8, and minimum metadata about the researcher can be obtained.

  • In cases where researchers don’t have a profile that meets the above criteria, the researchers’ name can be displayed under their research grant, but no “researcher view” or “researcher card” should be created/linked for such a researcher, i.e. these researchers are not included in the researcher index

  • Prioritising trusted sources: Information from a researcher’s organisation is prioritised above information from other organisations and individuals. This includes preferred contact information, title and other information about the individual.

Principle 6: Organisation

RLA records data about an organisation:

  • That is associated with at least a grant/project/patent

  • Where good quality information about them could be obtained:

    • An organisation that has at least one grant/project as described in Principle 2 or at least one patent in Principle 5). 

      • If an organisation is a business, then validation by either an identifier (ABN, ACN), website, or location is preferred

    • An organisation that approaches ARDC and would like to create and maintain information about their organisation in a dedicated organisation page to showcase their research-industry collaboration (this is more about business as recorded, for example, CSIRO’s business collaboration survey, need more discussion)

    • An organisation that has at least one facility or instrument  with  minimum metadata

    • A facility who provides an environment for hosting equipment and personnel (e.g. NCRIS facilities - NCI, ANFF, …) can be described as an organisation, applying the rules above. 

Principle 7: Publication & Patent

RLA records data about a publication or a patent:

  • that has at least one known Australian researcher (e.g. through ORCID or other identifier) or Australian organisation (usually applies to patent), or 

  • RLA doesn’t intend to be a bibliography or patent search engine

Principle 8: Instrument

RLA records about an instrument - here instrument is a tool or device used for carrying out a research activity or a product test/measurement/evaluation. An instrument may be hosted or supported by an organisation including a facility. 

  • A instrument is linked to an organization, and has minimum metadata

References:

RLA Information Model and Metadata Guidelines

JavaScript errors detected

Please note, these errors can depend on your browser setup.

If this problem persists, please contact our support.