Institution Manager manual - Manage your EAD and EAC-CPF files

From Archives Portal Europe Wiki
Jump to: navigation, search

In order to publish information on archival material gathered from different countries and institutions as consistent as possible, a common EAD profile has been defined, named apeEAD, as well as a common EAC-CPF profile, named apeEAC-CPF. All information relative to these profiles can be found in the Standards section of this Wiki and the profiles themselves can be found over here: http://www.archivesportaleurope.net/Portal/profiles/apeEAD.xsd and here: http://www.archivesportaleurope.net/Portal/profiles/apeEAC-CPF.xsd.


Note: there is an instruction video available which shows the basic workflow of uploading content to the Archives Portal Europe, you can find that over here.



Unless already compliant to them, the original local files have thus to be converted to these specific schemas before being published. The portal hosts different types of EAD files: finding aids, holding guides and source guides. There is a hierarchical relation between the holdings guide and the finding aids that is materialised in the search tree of the advanced search page of the portal:

Hierarchy between the holdings guide and the finding aids


There are also relations possible between the EAC-CPF files and the finding aids published in the portal to allow the users to move from one to the other easily thanks to internal links displayed in the "Archival materials" facet of the EAC-CPF files display:

Links from an EAC-CPF file to finding aids



Prepare your data

Only XML files can be uploaded to the Archives Portal Europe. These can be database exports or copies of existing EAD/XML files. During the export, a mapping could be needed to a local XML format, to a target schema such as EAD2002 or directly to the EAD profile defined for the Archives Portal Europe (apeEAD). It is wise to collect all files in one place (e.g. one folder) which would be of help for submitting the data in one-go, f.i. when intending to use either an OAI-PMH repository or a FTP server to upload files or when wanting to upload several files combined in a zip-file via HTTP.

It is highly beneficial for the archives to think ahead of a bigger picture: the data ecosystem on the web. It is very likely that the content providers have to take care of the integrity of their own data on the internet as they increasingly tend to publish the data through various channels including their own website(s), third party websites (e.g. international, national, and regional portals, thematic portals etc.), and Linked Open Data publication.

The issue of data integrity is that the content providers continuously update their source data, implying that the data available through different channels also needs to be updated and/or give feedback to the original data source.

Some tips are given below to better take this into consideration:

  • keep track on data exports and create versioning if possible.
  • keep track on the update of the original (source) data to make sure the latest version of the data is available on the Archives Portal Europe.
  • when a big change of a (source) data system occurs in your institution, pay attention to the hyperlinks and the Persistent Identifiers (PID) in relation to the update of the data on the Archives Portal Europe
  • Archives Portal Europe may develop Web 2.0 functionality in near future where the User Generated Content (UGC) such as feedback and tagging may be included; it may be the case that the content providers also implement such functionality; so always think of the entire workflow and ecosystem of the data circulation and distribution.

Create profiles to automate the processing of the data

In order to facilitate the work of the Institution Managers, the Dashboard provides the possibility to establish specific profiles. These profiles, mandatory when using the OAI-PMH functionality, allow to automatically process the data once uploaded in the portal. It is particularly useful in the case of regular updates and additions of content in the portal. However, it is recommended to first test the portal data processing functionality manually in order to better see the different possibilities and check what is best for your data.

The profiles are used in the Dashboard to indicate which actions are to be applied to the uploaded files. You can create as many profiles as needed, for instance you could apply different rules for files without images, and files containing links to images. When using the OAI-PMH harvesting, using a profile is mandatory. Please note that you can create a "manual" profile that will allow you to process the data yourself, step after step, after harvesting/uploading.

When you create a profile, you have to give the profile a name, and precise the type of file (finding aid, holdings guide, source guide or EAC-CPF records) it is associated to and the forms will be adapted accordingly. Then you can indicate your choices in two different tabs: preferences for the Archives Portal Europe (tab Basic preferences displayed by default) and preferences for Europeana (visible as second tab called Europeana preferences).

Preferences for EAD and EAC-CPF files to be published in the Archives Portal Europe

basic preferences tab for publishing content in Archives Portal Europe


The basic preferences indicate the default actions to apply to your files:

  • publish, convert or validate the files, or nothing. Of course, if you choose "publish", the files will also be converted and validated,
  • overwrite or keep the existing file if duplicate,
  • discard the file or add the <eadid/> manually if missing,
  • specify the type of the <dao/> elements; the type displays a corresponding icon to indicate to the user whether the digitised document is a text, an image, a sound, etc.; this indication also serves for Europeana.

Specific options can be provided for the files, regarding the rights and the <dao/>-type. Note that this can be also added afterwards manually, file per file, from the content manager, when clicking on options in the column converted.

Overview of the possibilities of the basic preferences tab

  • Default action for uploaded files:
    • Publish to the Archives Portal Europe (default value)
    • Publish to the Archives Portal Europe and Europeana (in this case filling in the next tab is mandatory)
    • Convert to APE format
    • Validate against APE format
    • Nothing (use content manager for actions)
  • Default action for already existing files:
    • Overwrite existing file with new file (default value)
    • Keep existing file, discard uploaded file
    • Keep existing file, ask for identifier in case of duplicates
  • Default action for files without <eadid> element:
    • Remove uploaded file (default value)
    • Specify value for <eadid> manually
  • Default type for <dao> items:
    • Unspecified (default value)
    • TEXT
    • IMAGE
    • SOUND
    • VIDEO
    • 3D

Note: in case you have enabled Take from file (<dao@xlink:role>) if existing, then the choices you make here will only be applied to files in case these don't have values specified yet; in other words: original values from the original files will not be overwritten, but transferred.

  • Default XSL for conversion:
    • DEFAULT

Note: the default choice here is the standard general local EAD to apeEAD conversion stylesheet, which is fine in 95% of the cases; however, it is possible that your local EAD files need some extra fine-tuning, which is not available in the standard stylesheet, in that case the Archives Portal Europe's technical team can provide a specific tweaked stylesheet for your institution and make that available here for you.

  • Default rights statement for digital objects:
    • --- (= none, default value)
    • Public Domain Mark
    • Creative Commons CC0 Public Domain Dedication
    • Creative Commons Attribution
    • Creative Commons Attribution, ShareAlike
    • Creative Commons Attribution, No Derivatives
    • Creative Commons Attribution, Non-Commercial
    • Creative Commons Attribution, Non-Commercial, ShareAlike
    • Creative Commons Attribution, Non-Commercial, No Derivatives
    • Copyright Not Evaluated
    • In Copyright
    • In Copyright EU Orphan Work
    • In Copyright Educational Use Permitted
    • No Copyright Non-Commercial Use Only
    • No Copyright Other Known Legal Restrictions

Note: these rights are taken over from the current copyright frameworks: Creative Commons and Rights Statements, which are also use by Europeana.

  • Default rights statement for EAD data:
    • idem as above for digital objects

Specific preferences for forwarding content to Europeana

preferences tab for forwarding content from Archives Portal Europe to Europeana


For Europeana, the EAD files have to be converted to another format (EDM), totally different from the EAD format, and then they will be published in another portal that has different re-use rules than the Archives Portal Europe. The preferences to indicate are therefore numerous and subdivided into general and specific settings.

As converting an EAD file to the EDM standard means "flattening" the description of the document, the main options are related to the information that you want to report from the high levels of description into each EDM record. Note also that you can choose between two types of conversion: the minimal and the full one. The minimal will only take some basic elements such as the unittitle.

Please note that you will be allowed to forward content to Europeana only if you have signed the Europeana Data Exchange Agreement (DEA).


Upload EAD and EAC-CPF files

The Dashboard allows three different protocols to upload the files: HTTP, FTP and OAI-PMH. HTPP and FTP are available by choosing Upload content in the main menu, OAI-PMH by choosing Create automated harvesting function.

A short overview of the pros and cons of each method:

HTTP FTP OAI-PMH
Pros No local installation needed, data delivery can be done from a local machine Data delivery can be managed/done remotely via a dedicated FTP server (without having data on a local machine) Data delivery can be fully automated via a dedicated OAI-PMH server
Great familiarity with the technology Data can be synchronised with local database system
Data can be offered to other service providers
De-facto standard for data exchange within cultural heritage sector
Cons Data delivery is always manual Server has to be deployed locally Server has to be deployed locally
Data delivery is always manual Set-up is not always simple

Upload via the HTTP protocol

Via the Dashboard option Upload content you access the dialogue screen for HTTP or FTP upload:

using HTTP or FTP to upload files


The default value is the HTTP protocol, which enables you to select one XML file or a zip-file containing more than one XML file. Before actually uploading the file(s), you can choose a profile to let the system apply specific actions after the uploading.

Your file has to be a valid XML file or a zip file containing valid XML files. If the files are not valid, the Dashboard will reject them with a notification. The size limit of the file (one XML file or one zip-file) is 200 MB.

In case you don't select a profile the Dashboard detects the type and status of your files immediately and asks you to make some choices. In case you have selected a profile, the - predefined - choices will be taken care of immediately and you can proceed to the Content manager screen to check the results of the uploading and processing.

These are the checks that the Dashboard can make and the errors it can detect:

  • detection/notification of valid and non-valid files; click on the You can continue to content manager button to continue to the next step:
detection/notification of valid and non valid files


Note: at this stage 'non-valid' files means files that are not recognised as XML file and therefore discarded

  • detection/notification of the type of the valid files (Finding Aid, Holdings Guide, Source Guide, EAC-CPF record) that can be stored and processed; you can change to type of document via the dropdown list if necessary and then click on the Accept button to continue to the next step:
detection/notification of the type of the valid files


  • detection of files that can be processed without a problem (Successful files) and files that have a problem with apeEAD schema validation (files with errors) and therefore have to be discarded (when you click on the link Click for more information you will get a more specific error message pointing to the exact location in the XML file where the problem occurs, which enables you to correct the file and upload it again later on):
detection/notification of successful files and files with errors


Note: the number of files you can check on your screen like this is 500, so if you upload a zip-file containing more than 500 XML files, you will be offered more than one page to accept.

  • detection of files that are already stored in your Archives Portal Europe account, with the possibility - per file - to either overwrite them or discard them (the options for the dropdown list behind each file are: overwrite and cancel):
detection/notification of files repeated and files with empty ID


  • detection of files that have an empty <eadid/> element (files with empty ID), so lack an identifier, with the possibility to provide that after a check whether it is not already in use by another already stored file:
providing missing ID for files with empty ID


Note: in this case it's important to add the identifier to the original (source) file too, otherwise this problem will occur again next time this file will be offered for upload/processing

Note: all these manual actions can be avoided by using a profile in which you have predefined all these actions; so manual processing of your files is a bit tedious, but when starting to contribute content to the Archives Portal Europe it's very useful to do this once, just to check the quality of your data.

Upload via the FTP protocol

For uploading data via the FTP prootcol, the process is similar to the HTTP. When choosing the FPT protocol in the upload content menu, you have to fill in the address of your FTP server, give the username and the password, and connect to the FTP server. The profile to apply can be selected afterwards, when you select the files to be uploaded.

uploading data via the FTP protocol, connecting to an FTP server


Upload via OAI-PMH harvesting

The use of the OAI-PMH protocol is highly recommended. For more information, please refer to the OAI-PMH website. You can also read the Best practice for OAI PMH Data Provider Implementations and Shareable Metadata (a bit old, but the bases of the OAI-PMH did not change).

Once set up, everything can be automated, from the harvest to the publication of data in the Portal and delivery to Europeana.


Note: there is an instruction video available on how to configure OAI-PMH harvesting in your Archives Portal Europe Dashboard account, you can find that over here.


General recommendations

There are many open source OAI-PMH tools (for more details see: http://www.openarchives.org/pmh/tools/tools.php). When implementing an OAI-PMH repository, it is recommended to test it before submitting data to the Archives Portal Europe. There are several (online) testing tools that can also be used for this purpose (e.g. OAI repository Explorer, see: http://re.cs.uct.ac.za and OAI-PMH Validator & Data extractor Tool, see: http://validator.oaipmh.com).

checking the quality of an OAI-PMH repository via: http://validator.oaipmh.com


Some important point have to be checked to ensure a correct harvesting process and take advantage of all its possibilities:

  • All verbs and arguments must be implemented in your repository (see the schema below).
general schema of the OAI-PMH protocol syntax


  • The repository must manage the deleted records (value set to "persistent"), in order to allow differential harvest. The differential harvest is indeed the major advantage of using the OAI-PMH protocol. After the first full harvest of your data, you only harvest the information related to the new, updated or deleted files, which makes the harvesting process generally faster. If the value is set to "no" or even "transient", you have to re-harvest everything (and process all the files again afterwards) including the files that did not change. This has as a consequence that your server and the Archives Portal Europe server have to perform a lot of redundant actions, which influences the bandwith of both your and our server negatively.
  • It is highly recommended to organise your data in sets (even if you can only provide one set for all your files) and if needed sub-sets, in order to better manage the harvests and to avoid too big chunks that might be harder to handle by the servers and take too long to be finished (up to several days). The sets can be based on your own file plan or whatever file organisation you have in your institution.
  • If possible, identifiers have to be unique and persistent URI, therefore they will not change over time and the links to your own website will not be broken.

Harvesting your data from the Dashboard

In the Dashboard, you act as harvester from your own repository. The advantage is that you fully control the harvesting process. The first step is to enter the base URL of your OAI repository to allow the Dashboard to check it:

checking the base url of an OAI-PMH repository in the Dashboard


After clicking on the Ok button, the Dashboard either recognises the OAI-PMH repository and then continues with the next step. or indicates that it can't recognise the url given as one of an OAI-PMH repository. In the latter case, you have to check at your end whether you made a typo in the base URL or whether the OAI-PMH repository might be offline. The second step consists of specifying the parameters for the harvest, based on the options the Dashboard has gathered from what your OAI-PMH repository offers:

  • Select a set: the tool lists the sets available for harvesting in your repository. Please note that you have to create one (and only one) harvest per set, so if your repository has more than one set, then you have to configure more than one harvesting job. This allows you to apply different profiles to your different sets according to your needs.
  • Select a metadata prefix: the tool lists the types of metadata found in your repository and asks you to select one of them. Any EAD-based metadata is accepted.
  • Select harvesting method: the usual (default) harvesting method is Harvest by ListRecords but in some cases you might have to use the Harvest by ListIdentifiers/GetRecord (failsafe) combination verbs, depending on what your repository supports.
  • Select an interval for your harvesting: this determinse the delay between the harvesting jobs: depending on the frequency of the updates of your data, you may want to harvest your data more or less frequently. The options offered are: 2 weeks, 1 month, 3 months and 6 months.
  • Harvest only on weekends?: this is about when the harvesting will take place; define yes, so only during weekends for instance in case your OAI-PMH repository is located on a server which is also used for other activities and you don't want to overload that serverduring working hours.
  • Set last harvesting date (if existing) - dd/MM/YYY: only necessary to use in case you have to configure a new harvesting job, but want to let that start with fetching files from a specific data onwards, for instance to not have to (redundantly) overwrite files already existing in the Archives Portal Europe.
  • Select a user profile: here you select the profile according to which settings you want the Dashboard to process the harvested files after the harvest; you might use different profiles for your different sets, for instance delivering one set to Europeana, and publishing the others only in the Archives Portal Europe.
  • Activation status: the status enabled will launch the harvest (immediately in the Content Checker environment, but with a limitation of only ten files, during the next night in the Production environment), the status disabled will stop the harvesting job cycle, so a possible already scheduled next harvesting job will not be activated.
filling in harvest parameters in the OAI-PMH dialogue screen of the Dashboard


When the harvest is terminated and successful, the files are available in the Content Manager and, depending on the profile chosen, ready to be further processed, or already converted, published, delivered to Europeana and so on. In parallel, you receive an automatic email notification from the Dashboard informing you of the result of the harvest: if the harvest succeeded or not, how many files could be harvested, and a short text describing the problem, if any. You can then check in the Portal and in the Dashboard how your data have been handled and displayed and get details on the errors.

Manage the harvests

The Dashboard option Create automatic harvest function stores the basic information on all harvesting jobs, so can be used as a harvesting log file. All performed harvests are presented in a table summarising for each of them the parameters chosen, the harvest result, the date of the harvest and the date of the next harvest. Each table entry allows edition of the harvest job and offers a download of a more detailed log file (by clicking on the texts in the errors column).

After a harvest has been concluded you have four possible results: succeed, succeed with warnings, succeed with errors and failed, indicated with a color code: green, orange and red.

status of successful harvesting jobs


status of successful and unsuccessful harvesting jobs


Except for the first result possibility (succeed), you can download files to check what happens.

  • SUCCEED WITH WARNINGS: a .txt file explains the issue and what you should do. For instance manually delete a record in the Content Manager, in which case the message will be: "Record 'oai:{reference of the record}' (2013-05-22) is deleted in OAI-PMH repository. Please delete it manually in the dashboard".
  • SUCCEED WITH ERRORS: a .txt file explains the issue. For instance a wrong identifier in the request meaning that this request could not be achieved
  • FAILED: in this case, no files at all could be uploaded in the Dashboard. The errors can be generated by the files themselves, for instance in case they are not compliant, or from the request made, or from a setting of the repository, etc. One or two files can be downloaded when a harvest fails: one indicating why the harvest failed and the url that led to the error (.txt file), the other one containing the result of the harvest to allow you checking the issue (.xml file), if needed.


Some examples of error messages:

  • Url that contains errors: 'http://{address of the repository}?verb=ListRecords&resumptionToken=5'

PARSE ERROR: [com.ctc.wstx.exc.WstxLazyException] Undeclared general entity "ugrave" at [row,col {unknown-source}]: [12296,271] Please download the OAI-PMH response from the Dashboard for analysis

If you then check the XML file at the row 12296, you see the entity "&ugrave" (for "ù" in html) instead of the entity "&#249".

  • The combination of the values of the from, until, set and metadataPrefix arguments results in an empty list.
  • java.io.IOException: HTTP response: Service Unavailable (Time out is 5 minutes)

Note that the error messages only comment on the first error encountered, but that doesn't mean that the same error is not encountered later on in a file or in a harvesting job.

the OAI Harvester tool

The Archives Portal Europe technical team has developed a standalone OAI Harvester tool allowing to harvest your data as you would in the Dashboard, using the same parameters etc. More information on this tool can be found in the OAI Harvester tool section of this Wiki.

It's useful for these cases:

  • to test your repository
  • to make a full harvest of your entire dataset and upload that in the Content Checker environment for testing purposes, because the harvesting functionality of the Content Checker environment is limited to the first ten records only.

Manage your files

The Content manger screen is the heart of the Dashboard. This screen allows you to check the status of all your content as uploaded/harvested to the Archives Portal Europe and manage it any further if needed. You can enter this screen via the link Content manager in the Dashboard:

link to the Content manager screen in the Dashboard


The Content manager screen is divided in three parts:

overview of all the options of the Content manager screen


1. The upper part of the screen (indicated as "1" in the screenshot above) groups all filtering and searching facilities.

The filtering on Finding aid, Holdings guide, Source guide and EAC-CPF corresponds with the 'view' on this type of content in the lower part of the screen (indicated as 3 in the screenshot above); in case in part 1 Finding aid is selected, part 3 will show all finding aids uploaded and processed, in case in part 1 Holding guide is selected, part 3 will show all holding guides, etc.

The searching - in the content as shown in part 3 - can be done by making a selection of the options offered and then click on the Search button. For example: in case you only 'check mark' Fatal error' in the options of Validated and click on the Search button, you will only see the files with a validation error (Fatal error) in part 3 below.

Note that if you move from one content view to another one (for instance from the Finding aid view to the EAC-CPF view), the filters remain active and might mislead you, so then it’s then necessary to reset them.


2. The middle part of the screen (indicated as "2" in the screenshot above) offers a possibility to perform batch actions on either all files as shown in the lower part of the screen, or on a selection of those files.

You have several possibilities to select files in the lower part of the screen:

  • manually by check marking the checkbox in front of each file in the table of the lower part of the screen,
  • automatically all files on a page by selecting All in the header of the first column of the table in the lower part of the screen (Selection [All] – [None]); this is handy in case you have made a selection of files by using the searching options in the upper part of the screen first; note that if you move to another page, the files on the former page will not be checked anymore,
  • automatically all files on all pages of the table of the lower part of the screen, (in the floating box on top right of the screen [Clear all] [Select all]), also indicating the total number of selected files.

The batch actions that can be performed on either all files or a selection of files are shown in the screenshot below:

overview of all possible batch actions


3. The lower part of the screen (indicated as "3" in the screenshot above) shows all files uploaded/harvested as a row in a table, indicating their identifier, their title, the date on which they entered the Dashboard and a summary of their processing status. The processing steps are to be seen from the left to the right: starting with the conversion step and ending with delivery done to Europeana step. The main steps can be processed independently - first convert, then validate, then publish the file etc. - or in combination (i.e. convert, validate, publish) or as part of a batch process. The actions available for each file are indicated in the last column. The dropdown menu of that column displays only the possible actions depending on the steps already achieved and the status of the file.

The actions that can be performed on each file individually are shown in the screenshot below:

overview of all possible actions per file


Note: for all actions goes: the files are put into a queue; depending on how many data are being processed, the queue can be longer or shorter; the status of the queue is indicated in the middle of the Content manager screen, above the batch options: number of your files in the queue, queue size and number of files before yours; when a file is the queue (which is indicated in grey in the action column), the only possible action is to remove it from the queue.


In case a file in the lower part of the screen shows an error (indicated as "4" in the main screenshot above), you can get a more detailed error message by activating the action Show error message in the last column:

detailed error message for a "Fatal error"


Via the blue bar in the middle of the Content manager screen you have access to a few more options:

  • you can limit or expand the amount of files shown in the table in the lower part of the screen (if you limit the amount of files shown, or scroll down, you will see that at the bottom of the screen you will have a summary of the Content manager screens statistics),
  • you can let the page refresh automatically (after one minute), which is convenient if you are waiting for the results of some actions, f.i. some batch actions,
  • you can 'jump' to another page of the table in the lower part of the screen, which is convenient of you have a large number of files available in the portal; in combination with the amount of files shown per page you can browse quickly through a huge amount of content
  • you can sort most of the content of the columns of the table in the lower part of the screen by clicking on the triangles (downwards and upwards, so descending and ascending) just below the titles in the headers of the columns
extra options/information of the Content manager screen


Download your files

It is possible to download your files from the Archives Portal Europe, but of course those will be in the specific Archives Portal Europe formats (apeEAD and apeEAC-CPF), so not in your original (source) format.

For just one file

For downloading just one file, useful in case you want to check it when an error has occurred, the Dashboard offers the download action in the dropdown list of the last column of the table in the lower part of the Content manager:

download one file via the Dashboard


For several or all files

Downloading more than one file at the same time, or all files at the same time, is not possible via the Dashboard, however, the Archives Portal Europe technical team offers Webdav functionality for that.

   for the files of the Production environment, go to: https://www.archivesportaleurope.net/files/ in a web browser
   for the files of the Content Checker environment, go: to https://contentchecker.archivesportaleurope.net/files/ in a web browser

You will be asked to enter login information, this has to be the same as the the loging details you use for logging into the Archives Portal Europe's Dashboard.

Once you have access, you will see that the files are hierarchically organised in folders by country, institutions and type of files. You are able to click on the link of your country, to get to the files of your institution. Depending on your status (Country Manager or Institution Manager) you have different right access: to the files of all institutions in a country or - within a country - only to the files of your institution(s):

access to Archives Portal Europe files via Webdav - country level


access to Archives Portal Europe files via Webdav - institution level


access to Archives Portal Europe files via Webdav - file folder level


Within your institution you can click on the folder with the files you want to access. For each institution, the files are organised by type of files: SG (Source Guides), EAG (EAG2012 files), EUROPEANA (EDM files), EAC-CPF (apeEAC-CPF records), HG (Holding Guides) and FA (Finding Aids). At the lowest level, clicking on the name of a file will display it in your browser.

You can download the content from any level, but to actually retrieve the files as batches, you have to use specific software, like for instance Cyberduck (a FTP platform which supports Webdav, see: https://cyberduck.io/):

download Archives Portal Europe files via Cyberduck