Open Repositories Day 2 – Using OAI-PMH Resource Harvesting…

2007 January 25
by Karen

Using OAI-PMH Resource Harvesting & MPEG-21 DIDL for Digital Preservation – Joan Smith
WWW and Digital Libraries are separate and different worlds. It is difficult to preserve websites for faculty and students.

Two problems with web site preservation

  • The counting problem – Finding everything
  • Crawlers can’t always reach every page
  • dynamic content
  • orphaned pages
  • protected pages
  • pages are too deep
  • The representation problem – Knowing what an object is
    • Resource Metadata: rare and unreliable
    • MIME Metadata: too simplistic

    Digital Preservations Requirements

    • Refreshing
    • Migration
    • Emulation

    Use OAI-PMH to Deal with this

    • You can package information with your object
    • mod_oai
    • part of Apache
    • Configure
    • You can issue OAI-PMH commands to the webserver and harvest things from the webserver
    • Can get more than metadata
    • Can get the record itself plus all the metadata you can about that information in MPEG-21 DIDL format – CRATE

    Plugins to handle gathering of metadata from different file formats

    • Jhove – Analysis by type
    • Kea – Key phrase extraction
    • OTS -
    • ExifTool
    • PDFlib-pCOS
    • MP3-Tag
    • Essence
    • GDFR
    • MD5

    Google will accept information mod_oai instead of a site map.

    Use the convention http://site.com/mod_oai to get to mod_oai information.

    For more information visit – http://www.modoai.org

    No comments yet

    Leave a Reply

    Note: You can use basic XHTML in your comments. Your email address will never be published.

    Subscribe to this comment feed via RSS