Skip to content

Performing enrichments #2  #141

@telezoic

Description

@telezoic

Hi Folks,
I'm building off an older issue in another repo here - Performing enrichments 14 - https://github.com/DigitalNZ/supplejack_docker.

We've been working with the Supplejack Docker repo for ~12 months or so as a prototype for a Provincial Digital Library. Many of our parsers in this build contain enrichments.

We've noticed that none of these enrichments successfully port over to the non-docker dev build (that you have graciously provided a walkthrough at: http://digitalnz.github.io/supplejack/start/installation-walk-through-by-example.html)

These enrichments invariably fail in this new build with the message
Failed with exception ActiveResource::Collection#each delegated to to_a.each, but to_a is nil: nil - http://eln-sj7.is.sfu.ca:3002/sidekiq/morgue

I'm wondering if there are some key differences the way enrichments are configured in https://github.com/DigitalNZ/supplejack_api vs https://github.com/DigitalNZ/supplejack_api_app, that could account for this. I see issue about Restful activity in the API (#116), but I'm not sure if it's related to any of this.

Any advice or pointers you could provide our way would be greatly appreciated.

I've included one of our failing enrichments for reference [it pulls a thumbnail link from an adjacent mets file]

class WaterlooUniversityLibrary < SupplejackCommon::Oai::Base

    base_url "https://uwspace.uwaterloo.ca/oai/request"
  
  namespaces mets: "http://www.loc.gov/METS/",
               dc: "http://purl.org/dc/elements/1.1/",
            xlink: "http://www.w3.org/TR/xlink/",
              dim: "http://www.dspace.org/xmlns/dspace/dim/"

  
  attribute :jurisdiction, default: "Ontario"
   #The jurisdiction field is identified in record_schema.rb. ,to allow us to filter results by region (and create API keys with roles assigned
  # to each jurisdiction. This allows us to create different front ends for different jurisdictions)
  
  attribute :internal_identifier, xpath: '//record/header/identifier'
  attribute :display_content_partner, default: 'University of Waterloo' 

  attribute :title, xpath: '//dc:title'
  attribute :description, xpath: '//dc:description'
  attribute :creator, xpath: '//dc:creator'
  attribute :subject, xpath: '//dc:subject', mappings: {
    'Harvested from' => ''

                          }
  
  attribute :display_date do fetch('//dc:date').truncate(4, "").select(:last) #changed to match 4 digit standard
    end
  
  attribute :category, xpath: '//dc:type', mappings: {'Doctoral Thesis' => 'Dissertation/thesis'}

  attribute :language, xpath: '//dc:language'
  attribute :rights, xpath: '//dc:rights'
  attribute :publisher, xpath: '//dc:publisher'
  
  attribute :source_url, xpath: '//dc:identifier' do
    get(:source_url).find_with('hdl.handle.net')
  end
  
  enrichment :get_thumbnail, priority: -4, required_for_active_record: false do
    
    requires :uri do
      primary[:source_url].mapping(/^.*handle\.net(.*)$/ => 'https://uwspace.uwaterloo.ca/metadata/handle\1/mets.xml').first
    end

    #for testing
    #attribute :subject, default: "#{requirements[:uri]}"
 
    url requirements[:uri]
    format :xml
    
  attribute :thumbnail_url, xpath: '//*[@USE="THUMBNAIL"]//@*[local-name()="href"]' do
    compose('https://uwspace.uwaterloo.ca', get(:thumbnail_url))
  end
    
  end
 end

Thanks for considering,
Dan

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions