The Invisible Web
a presentation by Marylaine Block for LACONI-RASS on November 14, 2003 at Batavia Public Library.
NOTE: The most valuable single resource for understanding this topic is The Invisible Web, Information Today, 2001, by Gary Price and Chris Sherman.
Some Questions Not Answered by General Search Engines
Is my plane on time?
Track the path of an approaching hurricane
check current road conditions
check hotel availability and prices
Find a recent webcast.
Watch birds hatching and the mothers feeding them, or other processes observable through continuous webcams
What does a machine gun sound like?
Get a visual display of your search results that shows amount of data and relationships between ideas.
Does any local library have a copy of American Infidel: Robert G. Ingersoll?
Find out if anybody has patented your nifty invention
How much did this stock I inherited cost at the time my uncle purchased it in 1956?
Get a list of stocks that match my personal criteria?
What concerts are going to be available in Boston next March?
where can I buy an out of print book called Angels and Spaceships? Or, I have a copy of this book; is it worth anything?
find out how to handle a chemical that may be toxic
compare nursing homes in my area
search for unclaimed property
find the author of a quote and verify its exact wording and source
What are usenet groups saying about a particular topic?
find articles in magazines and newspapers and reference sources that aren't published free on the web
find historical archives of magazine and newspaper articles
find internal corporate documents
search a national database for local obituaries and death notices
Find a snapshot of a web site that doesn't exist anymore
Why Is It Invisible?
Data is too current, real-time, constantly changing -- current stock price for specific companies, news (failure of search engines -- or searchers -- on Sept. 11). Google now respiders millions of sites daily; All the Web (http://alltheweb.com/) automatically displays two most recent stories first.
Format is difficult for crawlers -- post script documents, flash, audio, streaming video, etc. (though search engines are rapidly adding these capabilities -- Google access to Usenet groups, other search engines increasing the kinds of file formats they index)
Data is generated on the fly when question is asked; to get the answer you have to fill in the form on a specific database -- patent-searching, trip directions, etc.
Access is proprietary, forbidden to crawlers and/or passworded -- NY Times, commercial databases, intranets. "On the web" vs. "by way of the web"
Nobody linked to it
Sites that have the information may not be crawled in depth -- how many pages of the EPA's web site have been indexed by engines? Try a search by topic plus domain -- "acid rain" site: epa.gov, "guru interview" site:marylaine.com
Search engines limit the number of viewable results. [Google only displays two results from any one site unless you specifically click on MORE]. When completeness counts, use multiple search engines.
When To Use Invisible Web Resources
When you need real-time information -- flight-tracker, news, etc.
When you need dynamically-generated information from a database -- trip directions from here to there, sources for an out of print book, phone numbers and addresses for people and businesses.
When you need highly authoritative information from journals and other specialized sources (FindArticles.com http://www.findarticles.com/PI/index.jhtml, Bartleby http://bartleby.com/, EbscoHost, Making of America http://moa.umdl.umich.edu/, etc.)
When you need more control over the ways to limit the search -- in a history database, restrict by period or continent, for instance,
When you need a particular kind of content search engines don't do well with -- images, streaming video, etc.
When you already know who's likely to have produced the info you need and want to go directly there.
When you need to search a narrower more selective universe -- kids' sites, news sites, science sites, gov docs, a detailed bibliography, info on Samuel Johnson (which one?) etc.
When it's not publicly available unless you use a particular software -- RSS, peer-to-peer systems
You need to do a particular kind of searching -- Research Index, for instance, allows citation searching, or even browse in alphabetic order -- just try to search for The Nation in a keyword-based system
Finding Tools
General
Complete Planet - discover and search 103,000 databases and specialty search engines http://www.completeplanet.com/
Direct Search http://www.freepint.com/gary/direct.htm
The Invisible Web -- companion site to the book by Chris Sherman and Gary Price http://www.invisible-web.net/
ProFusion http://www.profusion.com/ -- "Target your search by drilling into one of these vertical search groups"
For more info on the scope and content of the invisible web, read: Bergman, M. K. (2001) The deep Web: surfacing hidden information. The Journal of Electronic Publishing, 7 (1) http://www.press.umich.edu/jep/07-01/bergman.html (18 January 2003)
Search Engines Inside the Search Engines
Google: Google Uncle Sam http://www.google.com/unclesam, Google Groups, Google News, Google Images, directory, catalogs, Linux sites, university search, Google Answers http://answers.google.com/ [see http://www.google.com/advanced_search?hl=en], and more
Alltheweb: http://alltheweb.com/ news, pictures, video, audio, ftp;
AltaVista http://altavista.com/ Images, MP3/audio, video, directory, news, yellow pages.
Lycos http://www.lycos.com/ -- Images, shopping, yellow pages, Lycos topics like blogs, kids, family zone, etc.
Directories and Specialized Search Engines
FIND DISCONTINUED SITES:
Use cache command on Google to find discontinued pages, e.g., cache:www.____.____
Internet Archive http://www.archive.org/ to search by URL; for topical search, still in beta, http://recall.archive.org/
FIND IMAGES, VIDEOS, WEBCAMS, WEBCASTS, ETC.
Finding Images and Sounds on the Web http://marylaine.com/images.html
Kartoo http://kartoo.com/ -- a visual display search engine.
FindSounds http://www.findsounds.com/
Google Directory - Webcams - Directories http://directory.google.com/Top/Computers/Internet/On_the_Web/Webcams/Directories/?tc=1/
WebCam Central http://www.camcentral.com/
Classical Music Search http://la.znet.com/~iwamura/page2.html -- "When you know a melody and you do not know its title or composer, this melody search engine will help you."
Singingfish - Find Audio and Video http://www.singingfish.com/
Streaming News and Video http://www.freepint.com/gary/audio.htm -- another Gary Price gem.
SEARCH FOR BOOKS:
AddALL Book Search and Price Comparison http://www.addall.com/ -- defaults to searching in print titles; click on Used and Out of Print for OOP
Finding Out of Print Books http://marylaine.com/bookbyte/getbooks.html
RedLightGreen http://www.redlightgreen.com/ -- RLG's shared catalog of the 126 million item records of its member libraries
SOME MISCELLANEOUS FINDING TOOLS:
Epinions http://www.epinions.com/ -- product reviews. Reviews themselves are rated by other readers, and the best rated become trusted reviewers
Daypop http://www.daypop.com/ -- search blogs, RSS feeds and news
Kids Click Search http://sunsite.berkeley.edu/KidsClick!/
FindLaw LawCrawler http://lawcrawler.findlaw.com/ -- one of several legal search engines
Medlineplus http://www.medlineplus.gov/ -- great starting place for vetted medical info.
Search Systems - Largest Free Public Records Database Collection http://www.searchsystems.net/
Use Two-step Searching
Use general search engine to search for the likely source or database, then search inside that page. Sample search statements:
streaming video + search engine
diabetes + association (sometimes your best info comes from the primary professional or charitable association involved with the topic)
patents + database
"rock music" + encyclopedia -- the only way you're going to find anything about the artist known simply as E
Hispanics + demographics
word 6.0 + tutorial
plumbing + "how to"
"hp deskjet 5550" + product review
cataloging + listserv (or discussion)
whales + webcam
|
There are many ways to approach the needle in the haystack problem:
A known needle in a known haystack
A known needle in an unknown haystack
An unknown needle in an unknown haystack
Any needle in a haystack
The sharpest needle in a haystack
Most of the sharpest needles in a haystack
All the needles in a haystack
Affirmation of no needles in a haystack
Things like needles in any haystack
Let me know whenever a new needle shows up
Where are the haystacks?
Needles, haystacks -- whatever
Matthew Koll. "Information Retrieval."
http://www.asis.org/Bulletin/Jan-00/track_3.html
|
 |
Becoming More Visible All the Time
Search engines are changing constantly, trying to give access to the invisible web. To keep up with search engine improvements you can have all of these mailed to you:
Research Buzz http://www.researchbuzz.com/
Resource Shelf http://www.resourceshelf.com/ -- daily tips from Gary Price.
Search Day http://searchenginewatch.com/searchday/
Search Engine Watch http://searchenginewatch.com/
Search Engine Showdown http://www.searchengineshowdown.com/
For tips on getting the most out of Google, read Tara Calishain's Google Hacks, O'Reilly, 2003.
For more on blogs, RSS, and site minders, read Steven Cohen's Keeping Current, ALA 2003.
To find library weblogs to keep you current on developments in librarianship and technology, see Peter Scott's Library Weblogs http://www.libdex.com/weblogs.html
|
|