MyResearch Portal – Andrew Nagy

2007 February 28
by Karen

MyResearch Portal – Andrew Nagy

ILS agnostic web portal for students and faculty to perform research activities

Create 1 single interface for all library resource to minimize interface learning curve

Develop in-house a “framework” to combine all of our resources

  • Most resources are in XML
  • Digital Library: METS
  • MetaLib XServer: XML
  • Catalog: MARCXML
  • Library Website: XHTML

Data Store

  • Native XML stores allows for easy storage of complex data
  • No need to develop a complete relational database and covert data – too messay
  • No need to normalize data
  • Just import!

Native XML Database- Could it be that simple?

eXist – Open Source

  • Still in infancy-ish stages
  • Platform independant
  • Java Backend
  • API: REST, SOAP
  • Full-text extension
  • Inferent directory structure
  • LDAP support
  • Large user base
  • Berkeley DB XML – Open Source

    • Proven capabilities
    • Support for a wide range of platforms
    • Good performance
    • Decent help support
    • Commercial backing
    • No full-text extensions
    • No inherent directories

    Commercial Options

    • MarkLogic
    • Enticing Discounts for .edu and non-profits
    • Commercial Support
    • Much more complex to administrator
    • Speed
  • X-Hue DB
  • Scalability Testing

    • eXist not meant for searching, more for browse and fetch
    • DBXML Sleepycat – rework queries and modified indexes to make these respond in 30-60 seconds

    Converted MARCXML to custom format because MARCXML not helpful (elements all have the same names)

    • dbxml – 1.6 -1.7 second response

    Query Optimization

    • This is an important step since we are dealing with infant technology
    • dbxml has a query plan generator
    • eXist will soon have a query plan generator and a new query optimizer

    Implementation

    Create a web portal using a Native XML Database

    Performance

    • The good
    • .9 seconds
  • The Bad
    • More advanced queries can get as high as 12-15 seconds
  • The Ugly
    • What happens when 10-50 simultaneous users search with advanced queries

    Need to develop a lots os search query translation algorithms to missing Full Text Extension

    So the answer is NOT YET!

    It’s a Sun Shiny Day

    • Apache SOLR to the rescue!
    • SOLR implements Lucene index on XML documents
    • SOLR is platform independent
    • Runs as a java web app
    • Interface via REST
  • Lots of full-text searching tools
  • No Standards compliant interface
    • XML database use XQueries
  • Performance is astonishing
  • Average results in .1 seconds over 492,000+ records
  • Slower performance with built-in faceting
  • Easy Implementation

    • XSL Stylesheet to covert MARCXML to SOLR XML
    • Coverated 492,000 in 2.5 hours
  • SOLR Import
    • 3 hours
    • Andrew showed the final product which uses Solr in the Lightning Talks

    Other options

    No comments yet

    Leave a Reply

    Note: You can use basic XHTML in your comments. Your email address will never be published.

    Subscribe to this comment feed via RSS