ER&L Summary
I went to a very detailed and interesting presentation entitled “Metadata Crosswalking Data Quality and Semantics”. The great thing about this presentation was the in depth look at how difficult it is to map from one metadata schema to another, particularly if there are inconsistencies in the cataloging of the records which you are crosswalking. The overall result? In depth data analysis of the records being crosswalked is necessary. Once you have analyzed the records, then you can construct complex processing logic to handle the crosswalking of the records. What I really took away from this presentation was that correct and consistent cataloging is REALLY important. Bad data just makes problems downstream.
Also, consistency within a collection is really important and it probably a good idea to document your practices. In the case of this presentation, the original data was MARC. Yet there were still inconsistencies and errors. Optional fields were particularly interesting, because whether or not a optional field was used varied throughout the collection. This seems like something which could and should be more consistent. Documenting cataloging practices for a collection or type of material in MARC is just as important as other metadata standards.
Another great presentation I attended was “Shelflessness as a Virtue: Preserving serendipity in an electronic reference collection”. This presentation discussed building a browable interface to electronic reference books. The project incorporated a bunch of technologies but fundementally used records from the catalog exported as MARCXML and indexed by Solr (go Solr!!) We’ve been talking about this issue at UH as well and one thing that came to mind when watching this presentation was whether not we should include freely available books in GoogleBooks that might be reference-oriented in such as display. There seems like there would be some value in this but I need to discuss it further with Cataloging and Public Services at UH.
Another good presentation I went to was “Knowledge Base And Related Tools project (KBART) Update” by Jason Price. This presentation discussed the work of the KBART group which is working to improve the data which exists in ERM and OpenURL resolver knowledge bases. Price presented some interesting data about how inaccurate OpenURL Knowledgebase information is as well as some insight into the cause of the problems: bad data, bad formatting, lack of knowledge. I really am looking forward to this group’s report and hoping that it can have an impact on improving the information in knowledge bases. Too often systems, electronic resources or cataloging folks are asked to design work-around for this bad data. Fixing the real root causes of this problem would be a much more expedient solution.
Probably my favorite presentation of the first full day of the conference was “Successful Institutional Repositories: Libraries that Provide Value-Added Publishing Services to Faculty and Campus Communities”. This presentation discussed libraries as publishers of content and the need to refocus institutional repository ventures as a set of services that libraries can offer to their campuses. The presenter Tim Tamminga made an extremely compelling argument and presented examples of libraries being the avenue for publishing student run undergraduate journals, senior portfolios, conference proceedings and open access journals. Perhaps the most interesting part of the presentation was the discussion of IRs as a way for University administration to market the University to others and as a way for faculty within the same large university to find out what each other is doing. I learned a lot from this presentation and it made me feel good about how I’ve been personally envisioning the role of IRs. The biggest problem I see is that in order to do this effectively libraries need to better understand the publishing services their users might want and their users behavior to make the “publishing” process as seamless and easy as possible. Still the possibilities are pretty cool and definately something I hope I get to work on more at UH.
I liked the first two paragraphs of your post, Karen. (Well, I liked the entire post; it was the first two paragraphs that inspired this comment.) In the Planet Code4Lib aggregator, the post right after yours was from Chris Rusbridge on Open Office as a document migration on demand tool- again. He notes “that code is better than specifications as representation information,” and goes on to describe how Open Office can be used as a migration tool for common office documents. What struck me about your post as it relates to Chris’ is “really important and it probably a good idea to document your practices.” I’ve been thinking about business process workflows as it relates to the OLE Project and would propose that the documentation of our practices could take the form of “code” that is the definition of our workflows. Blank workforms and documentation on the side about what the descriptionist is to do are replaced by decision trees and automated processes that aid the staff member in building consistent records.
Just some preliminary thoughts, really, based on the juxtaposition of these two posts in my feedreader. I would be curious to know your thoughts, though.
Peter, unfortunately I’m not sure decision trees and automated processes are helpful in all circumstances. Some metadata formats are so complex and leave so much to the institution or catalogers discretion that I think that documenting decisions about how you are going to treat a given collection are necessary. The only other option is to write very complex code which allows an institution to customize workflow and metadata fields on a collection by collection basis. Case in point, for some of our photographic collections it would be relevant to include cartographics for some not. So do we build two different forms one with cartographics one without or one form where cartographics are optional with documentation that says for x collection catalogers must include cartographics? I suppose we could build one form that is smart enough to know what collection the object being cataloged belongs to and show or not show cartographics accordingly. But even this can’t IMHO really document why you made the choices you did. So it can’t tell you that you aren’t using cartographics with a given set of photos because you lack cartographic info or they are all from a conference that took place in the same location.