Lycos Retriever has suddenly started getting some attention, even though it was opened to the public last December (but not announced). Korean bloggers noticed it long ago (also here). Anyway, it is in a somewhat decrepit state now. Images are no longer being served, the URL-mouseover is broken and the content hasn't been refreshed in months.
Retriever is an attempt to automatically generate something like the first-draft of a Wikipedia article on a topic, where thee topics come from the head of the Zipf-like distribution of search queries and from other places. Retriever categorizes a topic, disambiguates it (e.g. determines that 'Saturn' is ambiguous between 'Saturn (Automobile)', 'Saturn (Solar System)', 'Saturn (Video Game)', searches for pages relevant to that disambiguated sense of the topic, extracts and evaluates paragraphs and images from those retrieved pages and then arranges them into a (somewhat) coherent report on the topic. The report consists of an overview of the topic followed by a breakdown of subtopics, by applying techniques from question-answering and natural language generation (NLG).
See for example the pages we generated automatically (there is no human input at all) for:
Frida Kahlo http://www.lycos.com/info/frida-kahlo.html Other art
history topics are at http://www.lycos.com/dir/arts/art-history
King Kong (1933): http://www.lycos.com/info/king-kong-1933.html
Public-Key Cryptography: http://www.lycos.com/info/public-key-cryptography.html
Lyme Disease: http://www.lycos.com/info/lyme-disease.html
You can navigate the whole hierarchy here: http://www.lycos.com/retriever.html
This was an unpublicized soft-launch, mostly to see how the major search engines would spider the pages and how they would rank. We had planned to ramp this up to about 500K topics. Only about 30K are live right now and the system is frozen. The most up-to-date code is not live and may never be pushed onto the servers. It is a distributed system to which we could add machines to scale the offline processing.
Anyway, it was early days, and we had a long way to go, but I think the idea was neat. Essentially, I saw this as a question-answering system where the form of the answer was an encyclopedic report that answered the question "what do I need to know about topic X?"
There's a bit more description of the system in a short paper I wrote for HLT/NAACL 2006 in Brooklyn (paper and slides here): http://semanticsarchive.net/Archive/zY1ZmM4O/ Also here: acl.ldc.upenn.edu/N/N06/N06-2045.pdf
Our product manager gave a presentation on this at the 2006 Search Engines meeting available here: www.infonortics.com/searchengines/
Various folks have complained that Retriever is a violation of Fair Use, that we are profiting from others' content. I can't agree: Fair Use requires that you use only a small amount of a copyrighted text (yes) and that the use of the copyrighted text doesn't detract from the value of the original. Since each snippet is hyperlinked to its source, like a regular search snippet, the hyperlink brings sites to the public's attention that they would not have otherwise seen, especially since Retriever looks at 500 to 1000 pages per topic, where the average user only looks at the top 10, and these are more and more polluted with junk.