Resources Importer

Overview

The resources bulk importer is designed to allow a site administrator to bulk upload resource data to the hub. It was designed and developed with speed and stability in mind. Because of its ability to mass upload/modify resources on the hub, it is an administrator only feature, meaning it can only be accessed and used via the adminstrator back end.

The importer has the ability to create the following for each record:

  1. The parent resource
  2. Tags on the parent resource
  3. Any child resources
  4. Any contributors (authors, editors, submitters, etc)

Using the Importer

To create an import you need to navigate to the administrator panel. Then go to Components > Resources > Import.

Click on the New (plus icon) button in toolbar in the top right of the page. Enter the name of the import and any notes you might want to associate with that specific import. These are mostly for internal use.

On the right is a file uploader that will allow you to select a data file, with all your raw resource data in it. If you have a very large data file or simply want to upload through SFTP then dont select any file and click save & close in the upper right. When each import is created a file space is also created for that import. If you now select that import in the listing and click edit you can then see all the import details again. Look in the "Import Data" box, under the "Data File" select dropdown. You will see the file space path where you can SFTP your data file. Once its been uploaded refresh this page and you should see you data file in the select drop down. Select it in the dropdown and click save.

On that page you might have also seen a couple other areas, "Import Parameters" and "Import Hooks".

Import Parameters

The "Import Parameters" section is pretty self explanitory. Basically these allow you to set the resources state, access, and group ownership when the resources are imported. It also has a parameter to check by title, which will check the database of already existing resources and update a matched resource with the incoming data instead of creating a new resource. Of course if a Hub resource ID is passed with the raw data then that will do the same as match by title just far more accurate. Its more accurate since match by title only looks to see if the incoming title is ~ 90% match to an existing resource title. The last param just determines whether or not to check the required custom fields of a resource type against the incoming data. If this is set to "yes" and a resource doesnt have all required fields it will fail to import. This will NOT stop the import of other resources though.

Import Hooks

The "Import Hooks" section can be very useful but might be confusing at first. To make the importer applicable to all hubs it expects that the raw incoming resource data is mapped to the EXACT fields in the database. To accomodate all import situations, import hooks were made. There are three types of hooks, "post parse", "post map", and "post convert" which are the 3 events you can listen for in the import process. You can run any number of hooks at each of these events in the order that you set for that import. All hooks are called on a single record at a time, meaning each hook takes the data of one resource record at a time and is reponsible for returning a modified version of that resource data.

Import hooks have a separate section found at: Components > Resources > Import Hooks. This allowed for the hooks (really php scripts) to be made into reusable blocks that you could attach to any or all of you imports.

  1. Post Parse: A good example of a post parse hook would be if you do not have the ability to control your raw data, so you simply write a PHP script which takes your raw data and maps it to the right resource fields. It could also be used for cleanup, removing bad urls, creating child resource & contributor objects, etc.
  2. Post Map: A good example of a post map hook would be if say you were running the same import again, with the match by title parameter on. As each record is being imported it finds a match and loads that existing resource. Say you wanted to make sure you didnt loose any changes to the tags on that resource between the time you first imported and now. Your post map script could fetch the existing tags and merge with the incoming. At the "Post Map" hook the raw data has been mapped to the Resource Objects and could possibly have a Hub resource ID.
  3. Post Convert: A good example of a post convert hook is if you wanted to do anything special to that resource, like generate a DOI or add badges. Since the DOI creation is not something we want to do for every resource its not something that happens by default in the importer. By this time the resource objects have already been saved to the database.

Running an Import

Running an import should be pretty easy. There are two modes, "dry-run" and "normal". Dry run simply allows you to run the import with all the hooks and settings in place, without actaully doing any creating or modifications to the data on in the database. Normal mode is the mode where it will save any incoming resources data.

** We recommend running in dry run mode first and at least spot checking the results in the list to make sure everything looks good before mass importing resources that then can be accessed by anyone.