I was thinking about this earlier today, and since I was wondering about it, I thought others might be too. Here’s what I’ve learned from my colleagues.
Federated search works by “broadcasting” a user’s search term to several databases. Essentially, it searches the other databases when the user is searching.
Discovery services, such as Summon (by Serials Solutions) and Ebsco Discovery Service, searches the databases beforehand. I assume they will probably search these other databases at a regular time – perhaps once a day, once a week, or once a month. They build up their own index of all the items in these other databases, so when the user searches, they’re not sending the search out anywhere – it’s all right there, on their own servers.
The potential benefits of these discovery services include:
- Quicker searching (because there is no “broadcast” involved)
- Results that are easier to interpret (by building your own index, you’re able to list the titles, authors, etc. in a consistent fashion)
- Better deduplication
A potential drawback is the competition. If content providers decide to offer their content to Ebsco but not Serials Solutions, or vice versa, libraries may end up basing their choices largely on that content. While that seems acceptable to me when choosing a discipline-specific database to subscribe to, it seems somehow counterintuitive for a search tool of this scope.
What else do you know – or do you want to know – about these new-fangled “discovery services”?
I have a slightly different interpretation on how the discovery services work. I think what you’ve described is how a search engine like Google works, but I think discovery services *receive* files from the vendors and publishers, which they then include in their proprietary index. The end result is the same, as is your list of benefits.
My concern with the discovery tools like Summon and the new EBSCO one, is how often they are updated (how often the metadata is re-harvested). With federated searching, you send our multiple searches to the current version of the relevant databases (I presume?), but with the Summon-type system there’s the danger that you will searching within (slightly) out–of-date content. DO Serials Solutions, EBSCO et al. indicate clearly how often their data is up-dated?
David, we recently had a representative from Serials Solutions come to talk to us about Summon. If I recall correctly, the metadata is reharvested nightly.
Good discussion.
(disclaimer: I’m with a federated search provider, http://www.deepwebtech.com/, and my comments are my own …
Summon and similar discovery tools are good in the library context, but are less-than-ideal in the enterprise context. The biggest problem, is that in the end, there is a limitation on what other third-party sources you can include in the search. What I’ve heard, is that being able to include some competitive content, or content not available through an API, or is internal (and therefore unique), you’re going to still be looking for a federated solution.
I would love to hear about people’s specific experience regarding this.
Larry.