Open Repositories Day 2 – Using OAI-PMH Resource Harvesting…
2007 January 25
Using OAI-PMH Resource Harvesting & MPEG-21 DIDL for Digital Preservation – Joan Smith
WWW and Digital Libraries are separate and different worlds. It is difficult to preserve websites for faculty and students.
Two problems with web site preservation
- The counting problem – Finding everything
- Crawlers can’t always reach every page
- dynamic content
- orphaned pages
- protected pages
- pages are too deep
- Resource Metadata: rare and unreliable
- MIME Metadata: too simplistic
Digital Preservations Requirements
- Refreshing
- Migration
- Emulation
Use OAI-PMH to Deal with this
- You can package information with your object
- mod_oai
- part of Apache
- Configure
- You can issue OAI-PMH commands to the webserver and harvest things from the webserver
- Can get more than metadata
- Can get the record itself plus all the metadata you can about that information in MPEG-21 DIDL format – CRATE
Plugins to handle gathering of metadata from different file formats
- Jhove – Analysis by type
- Kea – Key phrase extraction
- OTS -
- ExifTool
- PDFlib-pCOS
- MP3-Tag
- Essence
- GDFR
- MD5
Google will accept information mod_oai instead of a site map.
Use the convention http://site.com/mod_oai to get to mod_oai information.
For more information visit – http://www.modoai.org