I. The Genetic Blueprint
A decade after the invention of the World Wide Web, Tim Berners-Lee is promoting the “Semantic Web”. The Internet hitherto is a repository of digital content. It has a rudimentary inventory system and very crude data location services. As a sad result, most of the content is invisible and inaccessible. Moreover, the Internet manipulates strings of symbols, not logical or semantic propositions. In other words, the Net compares values but does not know the meaning of the values it thus manipulates. It is unable to interpret strings, to infer new facts, to deduce, induce, derive, or otherwise comprehend what it is doing. In short, it does not understand language. Run an ambiguous term by any search engine and these shortcomings become painfully evident. This lack of understanding of the semantic foundations of its raw material (data, information) prevent applications and databases from sharing resources and feeding each other. The Internet is discrete, not continuous. It resembles an archipelago, with users hopping from island to island in a frantic search for relevancy. selfstoragefirst.com
Even visionaries like Berners-Lee do not contemplate an “intelligent Web”. They are simply proposing to let users, content creators, and web developers assign descriptive meta-tags (“name of hotel”) to fields, or to strings of symbols (“Hilton”). These meta-tags (arranged in semantic and relational “ontologies” – lists of metatags, their meanings and how they relate to each other) will be read by various applications and allow them to process the associated strings of symbols correctly (place the word “Hilton” in your address book under “hotels”). This will make information retrieval more efficient and reliable and the information retrieved is bound to be more relevant and amenable to higher level processing (statistics, the development of heuristic rules, etc.). The shift is from HTML (whose tags are concerned with visual appearances and content indexing) to languages such as the DARPA Agent Markup Language, OIL (Ontology Inference Layer or Ontology Interchange Language), or even XML (whose tags are concerned with content taxonomy, document structure, and semantics). This would bring the Internet closer to the classic library card catalogue.
Even in its current, pre-semantic, hyperlink-dependent, phase, the Internet brings to mind Richard Dawkins’ seminal work “The Selfish Gene” (OUP, 1976). This would be doubly true for the Semantic Web.
Dawkins suggested to generalize the principle of natural selection to a law of the survival of the stable. “A stable thing is a collection of atoms which is permanent enough or common enough to deserve a name”. He then proceeded to describe the emergence of “Replicators” – molecules which created copies of themselves. The Replicators that survived in the competition for scarce raw materials were characterized by high longevity, fecundity, and copying-fidelity. Replicators (now known as “genes”) constructed “survival machines” (organisms) to shield them from the vagaries of an ever-harsher environment.
This is very reminiscent of the Internet. The “stable things” are HTML coded web pages. They are replicators – they create copies of themselves every time their “web address” (URL) is clicked. The HTML coding of a web page can be thought of as “genetic material”. It contains all the information needed to reproduce the page. And, exactly as in nature, the higher the longevity, fecundity (measured in links to the web page from other web sites), and copying-fidelity of the HTML code – the higher its chances to survive (as a web page).
Replicator molecules (DNA) and replicator HTML have one thing in common – they are both packaged information. In the appropriate context (the right biochemical “soup” in the case of DNA, the right software application in the case of HTML code) – this information generates a “survival machine” (organism, or a web page).
The Semantic Web will only increase the longevity, fecundity, and copying-fidelity or the underlying code (in this case, OIL or XML instead of HTML). By facilitating many more interactions with many other web pages and databases – the underlying “replicator” code will ensure the “survival” of “its” web page (=its survival machine). In this analogy, the web page’s “DNA” (its OIL or XML code) contains “single genes” (semantic meta-tags). The whole process of life is the unfolding of a kind of Semantic Web.
In a prophetic paragraph, Dawkins described the Internet:
“The first thing to grasp about a modern replicator is that it is highly gregarious. A survival machine is a vehicle containing not just one gene but many thousands. The manufacture of a body is a cooperative venture of such intricacy that it is almost impossible to disentangle the contribution of one gene from that of another. A given gene will have many different effects on quite different parts of the body. A given part of the body will be influenced by many genes and the effect of any one gene depends on interaction with many others…In terms of the analogy, any given page of the plans makes reference to many different parts of the building; and each page makes sense only in terms of cross-reference to numerous other pages.”
What Dawkins neglected in his important work is the concept of the Network. People congregate in cities, mate, and reproduce, thus providing genes with new “survival machines”. But Dawkins himself suggested that the new Replicator is the “meme” – an idea, belief, technique, technology, work of art, or bit of information. Memes use human brains as “survival machines” and they hop from brain to brain and across time and space (“communications”) in the process of cultural (as distinct from biological) evolution. The Internet is a latter day meme-hopping playground. But, more importantly, it is a Network. Genes move from one container to another through a linear, serial, tedious process which involves prolonged periods of one on one gene shuffling (“sex”) and gestation. Memes use networks. Their propagation is, therefore, parallel, fast, and all-pervasive. The Internet is a manifestation of the growing predominance of memes over genes. And the Semantic Web may be to the Internet what Artificial Intelligence is to classic computing. We may be on the threshold of a self-aware Web.
2. The Internet as a Chaotic Library
A. The Problem of Cataloguing
The Internet is an assortment of billions of pages which contain information. Some of them are visible and others are generated from hidden databases by users’ requests (“Invisible Internet”).
The Internet exhibits no discernible order, classification, or categorization. Amazingly, as opposed to “classical” libraries, no one has yet invented a (sorely needed) Internet cataloguing standard (remember Dewey?). Some sites indeed apply the Dewey Decimal System to their contents (Suite101). Others default to a directory structure (Open Directory, Yahoo!, Look Smart and others).