OAI manual Set up the harvest

From Archives Portal Europe Wiki
Jump to: navigation, search

To set up the harvest, you just have to follow the instructions displayed on the screen. How does the tool function? It sends the requests to the repository by using the normal [OAI-PMH syntax] (beginning with the first request: the verb Identify) and proposes the choices between the different possibilities offered by the repository as soon as it receives the answers.



indicate the address of your repository

The first question the tool will ask you is the url of the OAI-PMH server. This url or web address must include the prefix: http or https, for example: http://www.gahetna.nl/archievenoverzicht/oai-pmh

OAI Harvester manual, figure 4


The tool then asks you to indicate whether your are using a proxy server. In some network environments access to the internet is secured via a proxy server. If that is the case, then enter the url or web address of the proxy server (ask the administrator of your network environment about this). In case you don't use a proxy server, for example in case you use the tool at home, then you can skip answering this question by pressing the enter key.

OAI Harvester manual, figure 5


The harvester begins its dialogue with the repository by sending the request verbs and providing the according answers: list of metadata, list of sets, etc.

select the type of metadata that you want to harvest

The tool lists the types of metadata found in the repository and asks you to select one of them:

OAI Harvester manual, figure 6


In this example, data are provided in three different types of metadata: oai_dc (Dublin Core/XML), oai_ead (a short basic version of an EAD/XML finding aid), and oai_ead_full (the complete full version of an EAD/XML finding aid). Let's choose 3: oai_ead_full.

select the set that you want to harvest

Then the tool lists the datasets found in the repository with the chosen metadata format and gives them an arbitrary number to allow you to choose one. Please note that you can harvest only one dataset at a time, so if you want to harvest everything or more than one dataset, then you have go through this whole process per dataset.

OAI Harvester manual, figure 7


In this example you have 4 datasets, let's choose 1: naa1, the dataset containing all Nationaal Archief's finding aids in the category 1.x.xx, meaning: finding aids of governmental archives from before the year 1795.

select the FROM and TO dates

Then the tool asks whether you want it to take a beginning and end date of the data as available into consideration, so in this case: a beginning or end date for the creation or adaption of the finding aids in the chosen dataset. This is not mandatory and only useful in case you want to make a differential harvest, so a harvest of data as produced during a certain period.

OAI Harvester manual, figure 8


In case you don't want to make use of this functionality, then you can simply skip both options by pressing the enter key.

select the harvest method

Then the tool asks you whether you want to use the standard ListRecords harvesting method or the special ListIdentifiers/GetRecord harvesting method, which is to be preferred in case you encounter an unstable repository, because then the harvesting process will continue in case errors are encountered (fail safe).

OAI Harvester manual, figure 9


Let's choose 1: ListIdentifiers/GetRecord (fail safe).

select the type of records that you want to save

Next you can choose to save the metadata records (so the original files) either as the full OAI response (so the original files within an OAI "wrapper"), or in their original (metadata) format.

OAI Harvester manual, figure 10


In case you want to upload the files directly to the Archives Portal Europe, harvesting them in their original (metadata) format is to be preferred, so let's choose 1: Save only the metadata record (e.g. EAD, EDM or DC files)

start the harvest

Then the tool provides a summary of the choices and asks whether you want to proceed:

OAI Harvester manual, figure 11


To start the harvest choose 1: yes

Then the tool starts the actual harvesting by making a quick inventory of the files (303) that are in the requested dataset (each line represents one file, in this case one finding aid, with it's identifier (i.e. filename) and creation/adaptation date):

OAI Harvester manual, figure 12


and then immediately proceeds with fetching the files themselves, after which it will 'match' the amount of actually harvested files with the assessment earlier (303 of 303), providing at the same time information on how long it took to harvest the dataset:

OAI Harvester manual, figure 13


retrieve the files

Once the files are harvested, they are stored in the data folder of the tool, hierarchically ordered by repository, type of files and name of the dataset:

OAI Harvester manual, figure 14


When you open one of the files, for example the first one of this dataset: finding aid 1.01.01.00.xml, you can see that it is a valid EAD/XML file, as requested via the options of the tool:

OAI Harvester manual, figure 15


You could now zip this dataset and upload it manually in the Archives Portal Europe's Content Checker, or run them through the Data Preparation Tool first.