The FuzzyCat Secret: How it Discovers Books in Libraries
- The Building of FuzzyCat: A Series
- FuzzyCat: “Connecting Readers on the Web to Books in Libraries”
- The FuzzyCat Secret: How it Discovers Books in Libraries
- FuzzyCat Tangles with Some Hairy HTML
- FuzzyCat Crawls 10 Distinct OPACs
- FuzzyCat Project Shelved
FuzzyCat (0.4 alpha) is available for experimentation by anyone and for download by techies.
By now you know that FuzzyCat connects readers on the web to books in libraries. So does WorldCat, but FuzzyCat works in a completely different way, allowing for discovery of libraries not in WorldCat.
Here’s how WorldCat works. It is not open source so I cannot speak with authority or detail on this, but I have a pretty good idea. WorldCat has a database of book records. Each book has a bit of hidden bibliographic data called COinS, e.g., ISBN. WorldCat also has a registry of URLs for each library’s OpenURL resolver. Attach COinS to the resolver, and voilà, you have a link to the book in the library.
FuzzyCat does not have a database of book records, does not have COinS, nor does it have a registry of resolvers. It is a ‘have-not’. So how the heck does it work? The secret to FuzzyCat is design patterns. Most library catalogues (OPACs) conform to a relatively standard set of design patterns that can be analyzed in code, and used to perform book searches. Consider that most OPACs conform to the following basic patterns:
- Available for searching to the public on the library’s website.
- Is often on the library’s homepage, or linked from the homepage.
- A library website uses a particular vocabulary, i.e., labels like ‘search’ and ‘catalog’.
- A basic search form contains a single search box and a submit button.
- A basic keyword search can usually be performed by accepting defaults for any options.
- Matched results are often listed as links containing the book title.
- The search form and processing are seldom changed.
- Little libraries are more likely to have simple websites that are easier to analyze. These are the same libraries that are less likely to be in WorldCat.
Sound familiar? Together, the design patterns form an OPAC architecture. If several of the design patterns are true for a library website and catalogue, then FuzzyCat will be able to search it. Allow for a handful of variations, not dissimilar to cooking a soup, and you have a viable alternative to WorldCat.
The WorldCat approach has a precise configuration for each library and can find an exact library record. It is a highly reliable approach. Thing is, WorldCat does not contain my local public libraries, or those of any other library that has not purchased OCLC’s FirstSearch service. In contrast, FuzzyCat discovers a library catalogue and uses a keyword search to find a good match for a title. It has about the same margin of error as your average library patron. Not bad, in my opinion. While FuzzyCat may not work for a wonky exception, it is being designed to discover any library catalogue. The design sidesteps any notion of membership. That is why I am building it.
FuzzyCat is a screen scraper, a scrapper that knows its away around the unlit corners of the web. But I am not ruling out a friendly relationship between FuzzyCat and WorldCat. For example, the WorldCat registry is available through an API. FuzzyCat could first check for a resolver and use COinS to find a record; for those libraries that do not have a resolver, FuzzyCat could do its own tricks. Or maybe OCLC will like FuzzyCat. Since FuzzyCat is licensed as open source, any OCLC project based on it would have to be redistributed in the same open manner. How’s that for a reversal?




Leave your response!