Experiments with YazProxy
May 5th, 2008 by Karen
After spending the better part of the last couple weeks working with the WorldCat API to get at my library’s holdings, I realized that I’d made an incorrect assumption in my plans. I was describing what I was doing to our Head of Cataloging and Metadata Services over lunch when she interrupted me to say “you realize we don’t have OCLC all our holdings, right”. Well, no I didn’t realize that and after a conversation about WHY that isn’t the case I realized that more than ever I need an API for our catalog. The issue is that we are an Innovative site, which means no API. We do have a Z39.50 server though and I’ve been playing with using LibraryFind’s API to allow me to access certain data programmatically from that catalog.
What I really want though is an SRU/SRW server. So I started doing some digging and found out that I can use YAZProxy in this way. I successfully compiled and installed it on a test server (which for me is a big deal - anything that involves compiling makes me uneasy) and following the directions for setting it up. It appears to be running right but I’m having issues searching the UH catalog. I’m not sure if this is because I’m inexperienced with CQL and SRU or because something isn’t properly configured either in YAZProxy or our catalog’s Z39.50 server. I’m guessing the later and need to do some troubleshooting to figure out what is wrong.


I’m curious about why it isn’t the case. I definitely get the feeling that OCLC would _like_ libraries to add all their holdings to Worldcat (for their own business purposes, of course). Worldcat Local users are doing so–if not yet, then working on it. (soon to include the entire northwest summit consortium).
But I realize that most libraries, my own included, don’t do this, and feel that they couldn’t -possibly- do this, that it would create some kind of problem. But I don’t really understand why exactly, even at my own library.
Well, things like e-journals are particularly problematic for a number of reasons. First if a library gets the journal from an aggregator (like say EbscoHost’s Academic Search Complete) there is an issue with the fact that what you get changes frequently, which is why libraries rely on vendors like Serial Solutions to help them keep their data up to date. But even with Serial Solutions this information has to be batch loaded on a regular basis. In terms of OCLC, I believe that this would mean adding and removing ones symbol from records frequently enough to create a maintenance issue. Non-aggregator access can also change dramatically over time. Most libraries have issues keeping up with e-journals as it is.
Also, some library’s don’t add items to OCLC that they don’t lend via ILL. In their minds, what is the point of letting people know you have it if they can’t use it. Either they can’t ILL it or it is an electronic item whose access is limited to specific users. (Ejournals and E-books are good examples of things that typically only an institution’s affiliates have access to).
Oh then there is the issue of correct URLs for electronic items. Most institutions provide access to these via proxy so each institution has a slightly different url.
These are just a few reasons why libraries might not give OCLC all their holdings info. It sure does makes things messy though.
Karen,
A couple of options. The guys at UNT (Mark Phillips and Kurt Nordstrom) have successfully set up an SRU server, using Solr for the indexing, which might be an alternative approach. Ladd Hanson at UT Austin has set up a Zebra server running SRU, although I’m having some problems with it that we need to figure out. Also, I’m going to try to get Sebastian Hammer from Index Data here this summer, and it might be a good time to bring in Ralph LeVan from OCLC again (their resident SRU guru). We could have another SRU workshop like I did a couple of years ago.
If you’d like me to test your setup using the SRU client we have built into TexasHeritageOnline, let me know.
Danielle