Wiki source code of SwSemanticWeb

Last modified by Helmut Nagy on 2010/05/03 19:45

Show last authors
1 1 The Semantic Web
2
3 1.1 A new form of Web content that is meaningful to computers will unleash a revolution of new possibilities
4
5 By Tim Berners-Lee, James Hendler and Ora Lassila
6
7 The entertainment system was belting out the Beatles' "We Can Work It Out" when the phone rang. When Pete answered, his phone turned the sound down by sending a message to all the other ~~local~~ devices that had a ~~volume control~~.
8 His sister, Lucy, was on the line from the doctor's office: "Mom needs to see a specialist and then has to have
9 a series of physical therapy sessions. Biweekly or something. I'm going
10 to have my agent set up the appointments." Pete immediately agreed to
11 share the chauffeuring.
12
13 At the doctor's office, Lucy instructed her Semantic Web agent
14 through her handheld Web browser. The agent promptly retrieved
15 information about Mom's ~~prescribed treatment~~ from the doctor's agent, looked up several lists of ~~providers~~, and checked for the ones
16 ~~in-plan~~ for Mom's insurance within a ~~20-mile radius~~ of her ~~home~~ and with a ~~rating~~ of ~~excellent~~ or ~~very good~~ on trusted rating services. It then began trying to find a match between available ~~appointment times~~
17 (supplied by the agents of individual providers through their Web
18 sites) and Pete's and Lucy's busy schedules. (The emphasized keywords
19 indicate terms whose semantics, or meaning, were defined for the agent
20 through the Semantic Web.)
21
22 In a few minutes the agent presented them with a plan. Pete
23 didn't like it University Hospital was all the way across town from
24 Mom's place, and he'd be driving back in the middle of rush hour. He
25 set his own agent to redo the search with stricter preferences about ~~location~~ and ~~time~~. Lucy's agent, having ~~complete trust~~
26 in Pete's agent in the context of the present task, automatically
27 assisted by supplying access certificates and shortcuts to the data it
28 had already sorted through.
29
30 Almost instantly the new plan was presented: a much closer
31 clinic and earlier times but there were two warning notes. First, Pete
32 would have to reschedule a couple of his ~~less important~~
33 appointments. He checked what they were not a problem. The other was
34 something about the insurance company's list failing to include this
35 provider under ~~physical therapists~~: "Service type and insurance plan status securely verified by other means," the agent reassured him. "(Details?)"
36
37 Lucy registered her assent at about the same moment Pete was muttering,
38 "Spare me the details," and it was all set. (Of course, Pete couldn't
39 resist the details and later that night had his agent explain how it
40 had found that provider even though it wasn't on the proper list.)
41
42 1.1 Expressing Meaning
43
44 Pete and Lucy could use their agents to carry out all these tasks thanks not to the World Wide Web of today but
45 rather the Semantic Web that it will evolve into tomorrow. Most of the
46 Web's content today is designed for humans to read, not for computer
47 programs to manipulate meaningfully. Computers can adeptly parse Web
48 pages for layout and routine processing here a header, there a link to
49 another page but in general, computers have no reliable way to process
50 the semantics: this is the home page of the Hartman and Strauss Physio
51 Clinic, this link goes to Dr. Hartman's curriculum vitae.
52
53 The Semantic Web will bring structure to the meaningful content of
54 Web pages, creating an environment where software agents roaming from
55 page to page can readily carry out sophisticated tasks for users. Such
56 an agent coming to the clinic's Web page will know not just that the
57 page has keywords such as "treatment, medicine, physical, therapy" (as
58 might be encoded today) but also that Dr. Hartman ~~works~~ at this ~~clinic~~ on ~~Mondays~~, ~~Wednesdays~~ and ~~Fridays~~ and that the script takes a ~~date range~~ in ~~yyyy-mm-dd format~~ and returns ~~appointment times~~.
59 And it will "know" all this without needing artificial intelligence on
60 the scale of 2001's Hal or Star Wars's C-3PO. Instead these semantics
61 were encoded into the Web page when the clinic's office manager (who
62 never took Comp Sci 101) massaged it into shape using off-the-shelf
63 software for writing Semantic Web pages along with resources listed on
64 the Physical Therapy Association's site.
65
66 The Semantic Web is not a separate Web but an extension of the current
67 one, in which information is given well-defined meaning, better
68 enabling computers and people to work in cooperation. The first steps
69 in weaving the Semantic Web into the structure of the existing Web are
70 already under way. In the near future, these developments will usher in
71 significant new functionality as machines become much better able to
72 process and "understand" the data that they merely display at present.
73
74 The essential property of the World Wide Web is its
75 universality. The power of a hypertext link is that "anything can link
76 to anything." Web technology, therefore, must not discriminate between
77 the scribbled draft and the polished performance, between commercial
78 and academic information, or among cultures, languages, media and so
79 on. Information varies along many axes. One of these is the difference
80 between information produced primarily for human consumption and that
81 produced mainly for machines. At one end of the scale we have
82 everything from the five-second TV commercial to poetry. At the other
83 end we have databases, programs and sensor output. To date, the Web has
84 developed most rapidly as a medium of documents for people rather than
85 for data and information that can be processed automatically. The
86 Semantic Web aims to make up for this.
87
88 Like the Internet, the Semantic Web will be as decentralized as
89 possible. Such Web-like systems generate a lot of excitement at every
90 level, from major corporation to individual user, and provide benefits
91 that are hard or impossible to predict in advance. Decentralization
92 requires compromises: the Web had to throw away the ideal of total
93 consistency of all of its interconnections, ushering in the infamous
94 message "Error 404: Not Found" but allowing unchecked exponential
95 growth.
96
97 1.1 Knowledge Representation
98
99 For the semantic web to function, computers must have access to structured collections of information and
100 sets of inference rules that they can use to conduct automated
101 reasoning. Artificial-intelligence researchers have studied such
102 systems since long before the Web was developed. Knowledge
103 representation, as this technology is often called, is currently in a
104 state comparable to that of hypertext before the advent of the Web: it
105 is clearly a good idea, and some very nice demonstrations exist, but it
106 has not yet changed the world. It contains the seeds of important
107 applications, but to realize its full potential it must be linked into
108 a single global system.
109
110 Traditional knowledge-representation systems typically have been
111 centralized, requiring everyone to share exactly the same definition of
112 common concepts such as "parent" or "vehicle." But central control is
113 stifling, and increasing the size and scope of such a system rapidly
114 becomes unmanageable.
115
116 Moreover, these systems usually carefully limit the questions
117 that can be asked so that the computer can answer reliably or answer
118 at all. The problem is reminiscent of Gidel's theorem from mathematics:
119 any system that is complex enough to be useful also encompasses
120 unanswerable questions, much like sophisticated versions of the basic
121 paradox "This sentence is false." To avoid such problems, traditional
122 knowledge-representation systems generally each had their own narrow
123 and idiosyncratic set of rules for making inferences about their data.
124 For example, a genealogy system, acting on a database of family trees,
125 might include the rule "a wife of an uncle is an aunt." Even if the
126 data could be transferred from one system to another, the rules,
127 existing in a completely different form, usually could not.
128
129 Semantic Web researchers, in contrast, accept that paradoxes and
130 unanswerable questions are a price that must be paid to achieve
131 versatility. We make the language for the rules as expressive as needed
132 to allow the Web to reason as widely as desired. This philosophy is
133 similar to that of the conventional Web: early in the Web's
134 development, detractors pointed out that it could never be a
135 well-organized library; without a central database and tree structure,
136 one would never be sure of finding everything. They were right. But the
137 expressive power of the system made vast amounts of information
138 available, and search engines (which would have seemed quite
139 impractical a decade ago) now produce remarkably complete indices of a
140 lot of the material out there.
141 The challenge of the Semantic Web, therefore, is to provide a language
142 that expresses both data and rules for reasoning about the data and
143 that allows rules from any existing knowledge-representation system to
144 be exported onto the Web.
145
146 Adding logic to the Web the means to use rules to make inferences,
147 choose courses of action and answer questions is the task before the
148 Semantic Web community at the moment. A mixture of mathematical and
149 engineering decisions complicate this task. The logic must be powerful
150 enough to describe complex properties of objects but not so powerful
151 that agents can be tricked by being asked to consider a paradox.
152 Fortunately, a large majority of the information we want to express is
153 along the lines of "a hex-head bolt is a type of machine bolt," which
154 is readily written in existing languages with a little extra
155 vocabulary.
156
157 Two important technologies for developing the Semantic Web are
158 already in place: eXtensible Markup Language (XML) and the Resource
159 Description Framework (RDF). XML lets everyone create their own
160 tags hidden labels such as or
161 that annotate Web pages or sections of text on a page. Scripts, or
162 programs, can make use of these tags in sophisticated ways, but the
163 script writer has to know what the page writer uses each tag for. In
164 short, XML allows users to add arbitrary structure to their documents
165 but says nothing about what the structures mean.
166
167 ----
168
169 The Semantic Web will enable machines to COMPREHEND semantic documents and data, not human speech and writings.
170
171 ----
172
173 Meaning is expressed by RDF, which encodes it in sets of triples,
174 each triple being rather like the subject, verb and object of an
175 elementary sentence. These triples can be written using XML tags. In
176 RDF, a document makes assertions that particular things (people, Web
177 pages or whatever) have properties (such as "is a sister of," "is the
178 author of") with certain values (another person, another Web page).
179 This structure turns out to be a natural way to describe the vast
180 majority of the data processed by machines. Subject and object are each
181 identified by a Universal Resource Identifier (URI), just as used in a
182 link on a Web page. (URLs, Uniform Resource Locators, are the most
183 common type of URI.) The verbs are also identified by URIs, which
184 enables anyone to define a new concept, a new verb, just by defining a
185 URI for it somewhere on the Web.
186
187 Human language thrives when using the same term to mean
188 somewhat different things, but automation does not. Imagine that I hire
189 a clown messenger service to deliver balloons to my customers on their
190 birthdays. Unfortunately, the service transfers the addresses from my
191 database to its database, not knowing that the "addresses" in mine are
192 where bills are sent and that many of them are post office boxes. My
193 hired clowns end up entertaining a number of postal workers not
194 necessarily a bad thing but certainly not the intended effect. Using a
195 different URI for each specific concept solves that problem. An address
196 that is a mailing address can be distinguished from one that is a
197 street address, and both can be distinguished from an address that is a
198 speech.
199
200 The triples of RDF form webs of information about related things.
201 Because RDF uses URIs to encode this information in a document, the
202 URIs ensure that concepts are not just words in a document but are tied
203 to a unique definition that everyone can find on the Web. For example,
204 imagine that we have access to a variety of databases with information
205 about people, including their addresses. If we want to find people
206 living in a specific zip code, we need to know which fields in each
207 database represent names and which represent zip codes. RDF can specify
208 that "(field 5 in database A) (is a field of type) (zip code)," using
209 URIs rather than phrases for each term.
210
211 1.1 Ontologies
212
213 Of course, this is not the end of the story, because two databases may use different identifiers for what is in fact
214 the same concept, such as ~~zip code~~.
215 A program that wants to compare or combine information across the two
216 databases has to know that these two terms are being used to mean the
217 same thing. Ideally, the program must have a way to discover such
218 common meanings for whatever databases it encounters.
219 A solution to this problem is provided by the third basic component of
220 the Semantic Web, collections of information called ontologies. In
221 philosophy, an ontology is a theory about the nature of existence, of
222 what types of things exist; ontology as a discipline studies such
223 theories. Artificial-intelligence and Web researchers have co-opted the
224 term for their own jargon, and for them an ontology is a document or
225 file that formally defines the relations among terms. The most typical
226 kind of ontology for the Web has a taxonomy and a set of inference
227 rules.
228
229 The taxonomy defines classes of objects and relations among them. For example, an ~~address~~ may be defined as a type of ~~location~~, and ~~city codes~~ may be defined to apply only to ~~locations~~,
230 and so on. Classes, subclasses and relations among entities are a very
231 powerful tool for Web use. We can express a large number of relations
232 among entities by assigning properties to classes and allowing
233 subclasses to inherit such properties. If ~~city codes~~ must be of type ~~city~~ and cities generally have Web sites, we can discuss the Web site associated with a ~~city code~~ even if no database links a city code directly to a Web site.
234
235 Inference rules in ontologies supply further power. An ontology may
236 express the rule "If a city code is associated with a state code, and
237 an address uses that city code, then that address has the associated
238 state code." A program could then readily deduce, for instance, that a
239 Cornell University address, being in Ithaca, must be in New York State,
240 which is in the U.S., and therefore should be formatted to U.S.
241 standards. The computer doesn't truly "understand" any of this
242 information, but it can now manipulate the terms much more effectively
243 in ways that are useful and meaningful to the human user.
244
245 With ontology pages on the Web, solutions to terminology (and
246 other) problems begin to emerge. The meaning of terms or XML codes used
247 on a Web page can be defined by pointers from the page to an ontology.
248 Of course, the same problems as before now arise if I point to an
249 ontology that defines ~~addresses~~ as containing a ~~zip code~~ and you point to one that uses ~~postal code~~.
250 This kind of confusion can be resolved if ontologies (or other Web
251 services) provide equivalence relations: one or both of our ontologies
252 may contain the information that my zip code is equivalent to your
253 postal code.
254
255 Our scheme for sending in the clowns to entertain my customers
256 is partially solved when the two databases point to different
257 definitions of ~~address~~.
258 The program, using distinct URIs for different concepts of address,
259 will not confuse them and in fact will need to discover that the
260 concepts are related at all. The program could then use a service that
261 takes a list of ~~postal addresses~~ (defined in the first ontology) and converts it into a list of physical ~~addresses~~
262 (the second ontology) by recognizing and removing post office boxes and
263 other unsuitable addresses. The structure and semantics provided by
264 ontologies make it easier for an entrepreneur to provide such a service
265 and can make its use completely transparent.
266
267 Ontologies can enhance the functioning of the Web in many ways. They
268 can be used in a simple fashion to improve the accuracy of Web
269 searches the search program can look for only those pages that refer to
270 a precise concept instead of all the ones using ambiguous keywords.
271 More advanced applications will use ontologies to relate the
272 information on a page to the associated knowledge structures and
273 inference rules. An example of a page marked up for such use is online
274 at [http://www.cs.umd.edu/~hendler>http://www.cs.umd.edu/~hendler>_blank]. If you send your Web browser to that
275 page, you will see the normal Web page entitled "Dr. James A. Hendler."
276 As a human, you can readily find the link to a short biographical note
277 and read there that Hendler received his Ph.D. from Brown University. A
278 computer program trying to find such information, however, would have
279 to be very complex to guess that this information might be in a
280 biography and to understand the English language used there.
281
282 For computers, the page is linked to an ontology page that defines
283 information about computer science departments. For instance,
284 professors work at universities and they generally have doctorates.
285 Further markup on the page (not displayed by the typical Web browser)
286 uses the ontology's concepts to specify that Hendler received his Ph.D.
287 from the entity described at the URI [http://www. brown.edu>http://www. brown.edu>_blank] the Web page for Brown. Computers can also find that
288 Hendler is a member of a particular research project, has a particular
289 e-mail address, and so on. All that information is readily processed by
290 a computer and could be used to answer queries (such as where Dr.
291 Hendler received his degree) that currently would require a human to
292 sift through the content of various pages turned up by a search engine.
293
294 In addition, this markup makes it much easier to develop
295 programs that can tackle complicated questions whose answers do not
296 reside on a single Web page. Suppose you wish to find the Ms. Cook you
297 met at a trade conference last year. You don't remember her first name,
298 but you remember that she worked for one of your clients and that her
299 son was a student at your alma mater. An intelligent search program can
300 sift through all the pages of people whose name is "Cook" (sidestepping
301 all the pages relating to cooks, cooking, the Cook Islands and so
302 forth), find the ones that mention working for a company that's on your
303 list of clients and follow links to Web pages of their children to
304 track down if any are in school at the right place.
305
306 1.1 Agents
307
308 The real power of the Semantic Web will be realized when people create many programs that collect Web content from diverse
309 sources, process the information and exchange the results with other
310 programs. The effectiveness of such software agents will increase
311 exponentially as more machine-readable Web content and automated
312 services (including other agents) become available. The Semantic Web
313 promotes this synergy: even agents that were not expressly designed to
314 work together can transfer data among themselves when the data come
315 with semantics.
316
317 An important facet of agents' functioning will be the exchange of
318 "proofs" written in the Semantic Web's unifying language (the language
319 that expresses logical inferences made using rules and information such
320 as those specified by ontologies). For example, suppose Ms. Cook's
321 contact information has been located by an online service, and to your
322 great surprise it places her in Johannesburg. Naturally, you want to
323 check this, so your computer asks the service for a proof of its
324 answer, which it promptly provides by translating its internal
325 reasoning into the Semantic Web's unifying language. An inference
326 engine in your computer readily verifies that this Ms. Cook indeed
327 matches the one you were seeking, and it can show you the relevant Web
328 pages if you still have doubts. Although they are still far from
329 plumbing the depths of the Semantic Web's potential, some programs can
330 already exchange proofs in this way, using the current preliminary
331 versions of the unifying language.
332
333 Another vital feature will be digital signatures, which are encrypted
334 blocks of data that computers and agents can use to verify that the
335 attached information has been provided by a specific trusted source.
336 You want to be quite sure that a statement sent to your accounting
337
338 program that you owe money to an online retailer is not a forgery
339 generated by the computer-savvy teenager next door. Agents should be
340 skeptical of assertions that they read on the Semantic Web until they
341 have checked the sources of information. (We wish more ~~people~~ would learn to do this on the Web as it is!)
342
343 Many automated Web-based services already exist without semantics, but
344 other programs such as agents have no way to locate one that will
345 perform a specific function. This process, called service discovery,
346 can happen only when there is a common language to describe a service
347 in a way that lets other agents "understand" both the function offered
348 and how to take advantage of it. Services and agents can advertise
349 their function by, for example, depositing such descriptions in
350 directories analogous to the Yellow Pages.
351
352 Some low-level service-discovery schemes are currently available, such
353 as Microsoft's Universal Plug and Play, which focuses on connecting
354 different types of devices, and Sun Microsystems's Jini, which aims to
355 connect services. These initiatives, however, attack the problem at a
356 structural or syntactic level and rely heavily on standardization of a
357 predetermined set of functionality descriptions. Standardization can
358 only go so far, because we can't anticipate all possible future needs.
359
360 ----
361
362 Properly designed, the Semantic Web can assist the evolution of human knowledge as a whole.
363
364 ----
365
366 The Semantic Web, in contrast, is more flexible. The consumer and
367 producer agents can reach a shared understanding by exchanging
368 ontologies, which provide the vocabulary needed for discussion. Agents
369 can even "bootstrap" new reasoning capabilities when they discover new
370 ontologies. Semantics also makes it easier to take advantage of a
371 service that only partially matches a request.
372
373 A typical process will involve the creation of a "value chain"
374 in which subassemblies of information are passed from one agent to
375 another, each one "adding value," to construct the final product
376 requested by the end user. Make no mistake: to create complicated value
377 chains automatically on demand, some agents will exploit
378 artificial-intelligence technologies in addition to the Semantic Web.
379 But the Semantic Web will provide the foundations and the framework to
380 make such technologies more feasible.
381
382 Putting all these features together results in the abilities
383 exhibited by Pete's and Lucy's agents in the scenario that opened this
384 article. Their agents would have delegated the task in piecemeal
385 fashion to other services and agents discovered through service
386 advertisements. For example, they could have used a ~~trusted~~ service to take a list of ~~providers~~ and determine which of them are ~~in-plan~~ for a specified ~~insurance plan~~ and ~~course of treatment~~.
387 The list of providers would have been supplied by another search
388 service, et cetera. These activities formed chains in which a large
389 amount of data distributed across the Web (and almost worthless in that
390 form) was progressively reduced to the small amount of data of high
391 value to Pete and Lucy a plan of appointments to fit their schedules
392 and other requirements.
393
394 In the next step, the Semantic Web will break out of the
395 virtual realm and extend into our physical world. URIs can point to
396 anything, including physical entities, which means we can use the RDF
397 language to describe devices such as cell phones and TVs. Such devices
398 can advertise their functionality what they can do and how they are
399 controlled much like software agents. Being much more flexible than
400 low-level schemes such as Universal Plug and Play, such a semantic
401 approach opens up a world of exciting possibilities.
402
403 For instance, what today is called home automation requires careful
404 configuration for appliances to work together. Semantic descriptions of
405 device capabilities and functionality will let us achieve such
406 automation with minimal human intervention. A trivial example occurs
407 when Pete answers his phone and the stereo sound is turned down.
408 Instead of having to program each specific appliance, he could program
409 such a function once and for all to cover every ~~local~~ device that advertises having a ~~volume control~~ the TV, the DVD player and even the media players on the laptop that he brought home from work this one evening.
410
411 The first concrete steps have already been taken in this area, with
412 work on developing a standard for describing functional capabilities of
413 devices (such as screen sizes) and user preferences. Built on RDF, this
414 standard is called Composite Capability/Preference Profile (CC/PP).
415 Initially it will let cell phones and other nonstandard Web clients
416 describe their characteristics so that Web content can be tailored for
417 them on the fly. Later, when we add the full versatility of languages
418 for handling ontologies and logic, devices could automatically seek out
419 and employ services and other devices for added information or
420 functionality. It is not hard to imagine your Web-enabled microwave
421 oven consulting the frozen-food manufacturer's Web site for optimal
422 cooking parameters.
423
424 1.1 Evolution of Knowledge
425
426 The semantic web is not "merely" the tool for conducting individual tasks that we have discussed so far.
427 In addition, if properly designed, the Semantic Web can assist the
428 evolution of human knowledge as a whole.
429 Human endeavor is caught in an eternal tension between the
430 effectiveness of small groups acting independently and the need to mesh
431 with the wider community. A small group can innovate rapidly and
432 efficiently, but this produces a subculture whose concepts are not
433 understood by others. Coordinating actions across a large group,
434 however, is painfully slow and takes an enormous amount of
435 communication. The world works across the spectrum between these
436 extremes, with a tendency to start small from the personal idea and
437 move toward a wider understanding over time.
438
439 An essential process is the joining together of subcultures
440 when a wider common language is needed. Often two groups independently
441 develop very similar concepts, and describing the relation between them
442 brings great benefits. Like a Finnish-English dictionary, or a
443 weights-and-measures conversion table, the relations allow
444 communication and collaboration even when the commonality of concept
445 has not (yet) led to a commonality of terms.
446
447 The Semantic Web, in naming every concept simply by a URI, lets
448 anyone express new concepts that they invent with minimal effort. Its
449 unifying logical language will enable these concepts to be
450 progressively linked into a universal Web. This structure will open up
451 the knowledge and workings of humankind to meaningful analysis by
452 software agents, providing a new class of tools by which we can live,
453 work and learn together.
454
455 ----
456
457 *Further Information:*
458
459 *Weaving the Web: The Original Design and Ultimate Destiny of the World Wide Web by Its Inventor.* \\
460 Tim Berners-Lee, with Mark Fischetti. Harper San Francisco, 1999.\\
461 An enhanced version of this article is on the Scientific American Web site, with additional material and links.
462
463 World Wide Web Consortium (W3C): [www.w3.org/>http://www.w3.org/>_blank]
464
465 W3C Semantic Web Activity: [www.w3.org/2001/sw/>http://www.w3.org/2001/sw/>_blank]
466
467 An introduction to ontologies: [www.semanticweb.org/knowmarkup.html>http://www.semanticweb.org/knowmarkup.html>_blank]
468
469 Simple HTML Ontology Extensions Frequently Asked Questions (SHOE FAQ): [www.cs.umd.edu/projects/plus/SHOE/faq.html>http://www.cs.umd.edu/projects/plus/SHOE/faq.html>_blank]
470
471 DARPA Agent Markup Language (DAML) home page: [www.daml.org/>http://www.daml.org/>_blank]