Monday, January 23, 2006

Revealing Web 2.0 Genome with ADIOS

Im some of my earlier posts I have written about ADIOS, an algorithm that learns constructions from raw corpora. ADIOS combines statistical and rule-based approaches, identifies significant segments in a corpus and distills hierarchical, context sensitive regularities that support structured generalization. What does that mean in plain english? Here is the news! ADIOS can produce new sentences after it has studied the existing communication. You do not need to teach it. Now more rules of grammar.

ADIOS offers an alternative to traditional rule-based syntax by saying that the complexity of language can stem from a rich repertoire of stored, more or less entrenched semantical and syntactical constructions, something that can be recognized using statistical analysis and can be used for inductive reasoning and grammar-like rule generation of new sentences.

(See the articles: Unsupervised learning of natural languages, Shimon Edelman, Zach Solan, David Horn, Eytan Ruppin; and Learning Syntactic Constructions from Raw Corpora, Shimon Edelman, Zach Solan, David Horn, Eytan Ruppin)

To continue with this theme, I will here try to speculate on the significance of ADIOS and some recent Web 2.0 related innovations, that may change the way we understand the processes of language, thinking and intelligence and on the other hand may offer really lucrative business opportunities based on processing Web 2.0 type of content.

There is one interesting fact about ADIOS. ADIOS algorithm can be used not only for analysis and induction of natural language sentences but also of any other sequential data with recurring motifs, such as music, proteins, DNA and more. You may have here the tool to study WEB 2.0 GENOME!

If you look at the Web 2.0 applications and Web 2.0 content and the glumsy, early attempts to build "folk" taxonomies and compare the ADIOS approach with this reality, the difference is the same as between Amy - the first cloned and genetically manipulated cow and Bailey - a puppy that was born 40 years ago on our backyard as a result of some mating experiments made by me, Molly and Buster (the dogs kindly given in our custody, by some neighbors), and some other kids on our street.

When I started to work with XML (the Xtensible Markup Language) ten years ago I believed that XML will enable us to describe the DNA of all information and interaction and would therefore finally start the new non-biological evolution of digital artefacts, or "memes". Until now this evolution has proceeded as a slow human-bound symbiotic process, tied with the brain capacity of humans who "TAG" digital artefacts (like pictures, songs, text strings, B2B invoices, etc.) with markup in a narrow and limited way.

As humans, our capacity to recognize wide-scale and nano-scale structures is limited. With the XML we may have the tool to describe the DNA but we still miss the key ingredients needed by the Prime mover and the evolutionary forces who would set the evolution in motion. Humans may never be able to recognize these constructions, but ADIOS-like algorithms may be able to do it.

No comments: