Open Repositories Day 2 - Using OAI-PMH Resource Harvesting…
Jan 25th, 2007 by Karen
Using OAI-PMH Resource Harvesting & MPEG-21 DIDL for Digital Preservation - Joan Smith
WWW and Digital Libraries are separate and different worlds. It is difficult to preserve websites for faculty and students.
Two problems with web site preservation
- The counting problem - Finding everything
- Crawlers can’t always reach every page
- dynamic content
- orphaned pages
- protected pages
- pages are too deep
- Resource Metadata: rare and unreliable
- MIME Metadata: too simplistic
Digital Preservations Requirements
- Refreshing
- Migration
- Emulation
Use OAI-PMH to Deal with this
- You can package information with your object
- mod_oai
- part of Apache
- Configure
- You can issue OAI-PMH commands to the webserver and harvest things from the webserver
- Can get more than metadata
- Can get the record itself plus all the metadata you can about that information in MPEG-21 DIDL format - CRATE
Plugins to handle gathering of metadata from different file formats
- Jhove - Analysis by type
- Kea - Key phrase extraction
- OTS -
- ExifTool
- PDFlib-pCOS
- MP3-Tag
- Essence
- GDFR
- MD5
Google will accept information mod_oai instead of a site map.
Use the convention http://site.com/mod_oai to get to mod_oai information.
For more information visit - http://www.modoai.org

