Skip to main content
Skip table of contents

Writing a custom harvest type for the ARDC Harvester

The ARDC Harvester comes with 4 basic harvest types

  • GET
  • OAI-PMH
  • CKAN
  • CSW

Each harvest type is implemented as a Python class and extends the Harvester class

The harvest types are located in the harvest_handlers directory and each harvest type should be self contained within it's own file. 

Looking into the simplest of Harvester as an example on how a harvest handler should perform

PY
from Harvester import *
class GETHarvester(Harvester):
    """
       {
            "id": "GETHarvester",
            "title": "GET Harvester",
            "description": "simple GET Harvester to fetch a single metadata document",
            "params": [
                {"name": "uri", "required": "true"},
                {"name": "xsl_file", "required": "false"}
            ]
      }
    """
    def harvest(self):
        self.getHarvestData()
        self.storeHarvestData()
        self.runCrossWalk()
        self.postHarvestData()
        self.finishHarvest()

The GETHarvester harvest handler shows a minimum required of what a harvest handler should have, namely the def harvest(self) definition, the property docstring that defines the harvest handler and the new class definition being a subclass of the Harvester class.

The subclass method may override the methods of the Harvester class

The docstring at the beginning of the class is used to provide the Registry with information about the harvest type. Although it was planned to use this information to determine and populate some of the fields in the Registry user interface, this is not yet fully implemented.

The docstring must take the form of a JSON object with the following properties:

  • id: The unique identifier for this harvest type.
  • title: The name of the harvest type as it appears in the “Harvest Method” dropdown in the “Harvester Settings” of the Registry.
  • description: A brief description of the harvest type, displayed to the right of the “Harvest Method” dropdown in the “Harvester Settings” of the Registry.
  • params: An array of the other parameters required to complete the specification of the harvest of a particular data source. For example, this might include a URI, or the type of harvester crosswalk to be used. Each array element is an object with these keys:
    • name: an identifier for this parameter name
    • required: one of the strings "true" or "false", indicating whether or not a Registry user must specify a value for this parameter.

Please see the implementation of the existing harvest types for examples.

This docstring will then be parsed and notify the ARDC Registry to populate the available harvester type field in the Data Source Settings page.

JavaScript errors detected

Please note, these errors can depend on your browser setup.

If this problem persists, please contact our support.