SwSemanticWeb

1

1 The Semantic Web

2

3

1.1 A new form of Web content that is meaningful to computers will unleash a revolution of new possibilities

4

5

By Tim Berners-Lee, James Hendler and Ora Lassila

6

7

The entertainment system was belting out the Beatles' "We Can Work It Out" when the phone rang. When Pete answered, his phone turned the sound down by sending a message to all the other ~~local~~ devices that had a ~~volume control~~.

8

His sister, Lucy, was on the line from the doctor's office: "Mom needs to see a specialist and then has to have

9

a series of physical therapy sessions. Biweekly or something. I'm going

10

to have my agent set up the appointments." Pete immediately agreed to

11

share the chauffeuring.

12

13

At the doctor's office, Lucy instructed her Semantic Web agent

14

through her handheld Web browser. The agent promptly retrieved

15

information about Mom's ~~prescribed treatment~~ from the doctor's agent, looked up several lists of ~~providers~~, and checked for the ones

16

~~in-plan~~ for Mom's insurance within a ~~20-mile radius~~ of her ~~home~~ and with a ~~rating~~ of ~~excellent~~ or ~~very good~~ on trusted rating services. It then began trying to find a match between available ~~appointment times~~

17

(supplied by the agents of individual providers through their Web

18

sites) and Pete's and Lucy's busy schedules. (The emphasized keywords

19

indicate terms whose semantics, or meaning, were defined for the agent

20

through the Semantic Web.)

21

22

In a few minutes the agent presented them with a plan. Pete

23

didn't like it University Hospital was all the way across town from

24

Mom's place, and he'd be driving back in the middle of rush hour. He

25

set his own agent to redo the search with stricter preferences about ~~location~~ and ~~time~~. Lucy's agent, having ~~complete trust~~

26

in Pete's agent in the context of the present task, automatically

27

assisted by supplying access certificates and shortcuts to the data it

28

had already sorted through.

29

30

Almost instantly the new plan was presented: a much closer

31

clinic and earlier times but there were two warning notes. First, Pete

32

would have to reschedule a couple of his ~~less important~~

33

appointments. He checked what they were not a problem. The other was

34

something about the insurance company's list failing to include this

35

provider under ~~physical therapists~~: "Service type and insurance plan status securely verified by other means," the agent reassured him. "(Details?)"

36

37

Lucy registered her assent at about the same moment Pete was muttering,

38

"Spare me the details," and it was all set. (Of course, Pete couldn't

39

resist the details and later that night had his agent explain how it

40

had found that provider even though it wasn't on the proper list.)

41

42

1.1 Expressing Meaning

43

44

Pete and Lucy could use their agents to carry out all these tasks thanks not to the World Wide Web of today but

45

rather the Semantic Web that it will evolve into tomorrow. Most of the

46

Web's content today is designed for humans to read, not for computer

47

programs to manipulate meaningfully. Computers can adeptly parse Web

48

pages for layout and routine processing here a header, there a link to

49

another page but in general, computers have no reliable way to process

50

the semantics: this is the home page of the Hartman and Strauss Physio

51

Clinic, this link goes to Dr. Hartman's curriculum vitae.

52

53

The Semantic Web will bring structure to the meaningful content of

54

Web pages, creating an environment where software agents roaming from

55

page to page can readily carry out sophisticated tasks for users. Such

56

an agent coming to the clinic's Web page will know not just that the

57

page has keywords such as "treatment, medicine, physical, therapy" (as

58

might be encoded today) but also that Dr. Hartman ~~works~~ at this ~~clinic~~ on ~~Mondays~~, ~~Wednesdays~~ and ~~Fridays~~ and that the script takes a ~~date range~~ in ~~yyyy-mm-dd format~~ and returns ~~appointment times~~.

59

And it will "know" all this without needing artificial intelligence on

60

the scale of 2001's Hal or Star Wars's C-3PO. Instead these semantics

61

were encoded into the Web page when the clinic's office manager (who

62

never took Comp Sci 101) massaged it into shape using off-the-shelf

63

software for writing Semantic Web pages along with resources listed on

64

the Physical Therapy Association's site.

65

66

The Semantic Web is not a separate Web but an extension of the current

67

one, in which information is given well-defined meaning, better

68

enabling computers and people to work in cooperation. The first steps

69

in weaving the Semantic Web into the structure of the existing Web are

70

already under way. In the near future, these developments will usher in

71

significant new functionality as machines become much better able to

72

process and "understand" the data that they merely display at present.

73

74

The essential property of the World Wide Web is its

75

universality. The power of a hypertext link is that "anything can link

76

to anything." Web technology, therefore, must not discriminate between

77

the scribbled draft and the polished performance, between commercial

78

and academic information, or among cultures, languages, media and so

79

on. Information varies along many axes. One of these is the difference

80

between information produced primarily for human consumption and that

81

produced mainly for machines. At one end of the scale we have

82

everything from the five-second TV commercial to poetry. At the other

83

end we have databases, programs and sensor output. To date, the Web has

84

developed most rapidly as a medium of documents for people rather than

85

for data and information that can be processed automatically. The

86

Semantic Web aims to make up for this.

87

88

Like the Internet, the Semantic Web will be as decentralized as

89

possible. Such Web-like systems generate a lot of excitement at every

90

level, from major corporation to individual user, and provide benefits

91

that are hard or impossible to predict in advance. Decentralization

92

requires compromises: the Web had to throw away the ideal of total

93

consistency of all of its interconnections, ushering in the infamous

94

message "Error 404: Not Found" but allowing unchecked exponential

95

growth.

96

97

1.1 Knowledge Representation

98

99

For the semantic web to function, computers must have access to structured collections of information and

100

sets of inference rules that they can use to conduct automated

101

reasoning. Artificial-intelligence researchers have studied such

102

systems since long before the Web was developed. Knowledge

103

representation, as this technology is often called, is currently in a

104

state comparable to that of hypertext before the advent of the Web: it

105

is clearly a good idea, and some very nice demonstrations exist, but it

106

has not yet changed the world. It contains the seeds of important

107

applications, but to realize its full potential it must be linked into

108

a single global system.

109

110

Traditional knowledge-representation systems typically have been

111

centralized, requiring everyone to share exactly the same definition of

112

common concepts such as "parent" or "vehicle." But central control is

113

stifling, and increasing the size and scope of such a system rapidly

114

becomes unmanageable.

115

116

Moreover, these systems usually carefully limit the questions

117

that can be asked so that the computer can answer reliably or answer

118

at all. The problem is reminiscent of Gidel's theorem from mathematics:

119

any system that is complex enough to be useful also encompasses

120

unanswerable questions, much like sophisticated versions of the basic

121

paradox "This sentence is false." To avoid such problems, traditional

122

knowledge-representation systems generally each had their own narrow

123

and idiosyncratic set of rules for making inferences about their data.

124

For example, a genealogy system, acting on a database of family trees,

125

might include the rule "a wife of an uncle is an aunt." Even if the

126

data could be transferred from one system to another, the rules,

127

existing in a completely different form, usually could not.

128

129

Semantic Web researchers, in contrast, accept that paradoxes and

130

unanswerable questions are a price that must be paid to achieve

131

versatility. We make the language for the rules as expressive as needed

132

to allow the Web to reason as widely as desired. This philosophy is

133

similar to that of the conventional Web: early in the Web's

134

development, detractors pointed out that it could never be a

135

well-organized library; without a central database and tree structure,

136

one would never be sure of finding everything. They were right. But the

137

expressive power of the system made vast amounts of information

138

available, and search engines (which would have seemed quite

139

impractical a decade ago) now produce remarkably complete indices of a

140

lot of the material out there.

141

The challenge of the Semantic Web, therefore, is to provide a language

142

that expresses both data and rules for reasoning about the data and

143

that allows rules from any existing knowledge-representation system to

144

be exported onto the Web.

145

146

Adding logic to the Web the means to use rules to make inferences,

147

choose courses of action and answer questions is the task before the

148

Semantic Web community at the moment. A mixture of mathematical and

149

engineering decisions complicate this task. The logic must be powerful

150

enough to describe complex properties of objects but not so powerful

151

that agents can be tricked by being asked to consider a paradox.

152

Fortunately, a large majority of the information we want to express is

153

along the lines of "a hex-head bolt is a type of machine bolt," which

154

is readily written in existing languages with a little extra

155

vocabulary.

156

157

Two important technologies for developing the Semantic Web are

158

already in place: eXtensible Markup Language (XML) and the Resource

159

Description Framework (RDF). XML lets everyone create their own

160

tags hidden labels such as or

161

that annotate Web pages or sections of text on a page. Scripts, or

162

programs, can make use of these tags in sophisticated ways, but the

163

script writer has to know what the page writer uses each tag for. In

164

short, XML allows users to add arbitrary structure to their documents

165

but says nothing about what the structures mean.

----

The Semantic Web will enable machines to COMPREHEND semantic documents and data, not human speech and writings.

----

Meaning is expressed by RDF, which encodes it in sets of triples,

174

each triple being rather like the subject, verb and object of an

175

elementary sentence. These triples can be written using XML tags. In

176

RDF, a document makes assertions that particular things (people, Web

177

pages or whatever) have properties (such as "is a sister of," "is the

178

author of") with certain values (another person, another Web page).

179

This structure turns out to be a natural way to describe the vast

180

majority of the data processed by machines. Subject and object are each

181

identified by a Universal Resource Identifier (URI), just as used in a

182

link on a Web page. (URLs, Uniform Resource Locators, are the most

183

common type of URI.) The verbs are also identified by URIs, which

184

enables anyone to define a new concept, a new verb, just by defining a

185

URI for it somewhere on the Web.

186

187

Human language thrives when using the same term to mean

188

somewhat different things, but automation does not. Imagine that I hire

189

a clown messenger service to deliver balloons to my customers on their

190

birthdays. Unfortunately, the service transfers the addresses from my

191

database to its database, not knowing that the "addresses" in mine are

192

where bills are sent and that many of them are post office boxes. My

193

hired clowns end up entertaining a number of postal workers not

194

necessarily a bad thing but certainly not the intended effect. Using a

195

different URI for each specific concept solves that problem. An address

196

that is a mailing address can be distinguished from one that is a

197

street address, and both can be distinguished from an address that is a

198

speech.

199

200

The triples of RDF form webs of information about related things.

201

Because RDF uses URIs to encode this information in a document, the

202

URIs ensure that concepts are not just words in a document but are tied

203

to a unique definition that everyone can find on the Web. For example,

204

imagine that we have access to a variety of databases with information

205

about people, including their addresses. If we want to find people

206

living in a specific zip code, we need to know which fields in each

207

database represent names and which represent zip codes. RDF can specify

208

that "(field 5 in database A) (is a field of type) (zip code)," using

209

URIs rather than phrases for each term.

1.1 Ontologies

Of course, this is not the end of the story, because two databases may use different identifiers for what is in fact

214

the same concept, such as ~~zip code~~.

215

A program that wants to compare or combine information across the two

216

databases has to know that these two terms are being used to mean the

217

same thing. Ideally, the program must have a way to discover such

218

common meanings for whatever databases it encounters.

219

A solution to this problem is provided by the third basic component of

220

the Semantic Web, collections of information called ontologies. In

221

philosophy, an ontology is a theory about the nature of existence, of

222

what types of things exist; ontology as a discipline studies such

223

theories. Artificial-intelligence and Web researchers have co-opted the

224

term for their own jargon, and for them an ontology is a document or

225

file that formally defines the relations among terms. The most typical

226

kind of ontology for the Web has a taxonomy and a set of inference

227

rules.

228

229

The taxonomy defines classes of objects and relations among them. For example, an ~~address~~ may be defined as a type of ~~location~~, and ~~city codes~~ may be defined to apply only to ~~locations~~,

230

and so on. Classes, subclasses and relations among entities are a very

231

powerful tool for Web use. We can express a large number of relations

232

among entities by assigning properties to classes and allowing

233

subclasses to inherit such properties. If ~~city codes~~ must be of type ~~city~~ and cities generally have Web sites, we can discuss the Web site associated with a ~~city code~~ even if no database links a city code directly to a Web site.

234

235

Inference rules in ontologies supply further power. An ontology may

236

express the rule "If a city code is associated with a state code, and

237

an address uses that city code, then that address has the associated

238

state code." A program could then readily deduce, for instance, that a

239

Cornell University address, being in Ithaca, must be in New York State,

240

which is in the U.S., and therefore should be formatted to U.S.

241

standards. The computer doesn't truly "understand" any of this

242

information, but it can now manipulate the terms much more effectively

243

in ways that are useful and meaningful to the human user.

244

245

With ontology pages on the Web, solutions to terminology (and

246

other) problems begin to emerge. The meaning of terms or XML codes used

247

on a Web page can be defined by pointers from the page to an ontology.

248

Of course, the same problems as before now arise if I point to an

249

ontology that defines ~~addresses~~ as containing a ~~zip code~~ and you point to one that uses ~~postal code~~.

250

This kind of confusion can be resolved if ontologies (or other Web

251

services) provide equivalence relations: one or both of our ontologies

252

may contain the information that my zip code is equivalent to your

253

postal code.

254

255

Our scheme for sending in the clowns to entertain my customers

256

is partially solved when the two databases point to different

257

definitions of ~~address~~.

258

The program, using distinct URIs for different concepts of address,

259

will not confuse them and in fact will need to discover that the

260

concepts are related at all. The program could then use a service that

261

takes a list of ~~postal addresses~~ (defined in the first ontology) and converts it into a list of physical ~~addresses~~

262

(the second ontology) by recognizing and removing post office boxes and

263

other unsuitable addresses. The structure and semantics provided by

264

ontologies make it easier for an entrepreneur to provide such a service

265

and can make its use completely transparent.

266

267

Ontologies can enhance the functioning of the Web in many ways. They

268

can be used in a simple fashion to improve the accuracy of Web

269

searches the search program can look for only those pages that refer to

270

a precise concept instead of all the ones using ambiguous keywords.

271

More advanced applications will use ontologies to relate the

272

information on a page to the associated knowledge structures and

273

inference rules. An example of a page marked up for such use is online

274

at [http://www.cs.umd.edu/~hendler>http://www.cs.umd.edu/~hendler>_blank]. If you send your Web browser to that

275

page, you will see the normal Web page entitled "Dr. James A. Hendler."

276

As a human, you can readily find the link to a short biographical note

277

and read there that Hendler received his Ph.D. from Brown University. A

278

computer program trying to find such information, however, would have

279

to be very complex to guess that this information might be in a

280

biography and to understand the English language used there.

281

282

For computers, the page is linked to an ontology page that defines

283

information about computer science departments. For instance,

284

professors work at universities and they generally have doctorates.

285

Further markup on the page (not displayed by the typical Web browser)

286

uses the ontology's concepts to specify that Hendler received his Ph.D.

287

from the entity described at the URI [http://www. brown.edu>http://www. brown.edu>_blank] the Web page for Brown. Computers can also find that

288

Hendler is a member of a particular research project, has a particular

289

e-mail address, and so on. All that information is readily processed by

290

a computer and could be used to answer queries (such as where Dr.

291

Hendler received his degree) that currently would require a human to

292

sift through the content of various pages turned up by a search engine.

293

294

In addition, this markup makes it much easier to develop

295

programs that can tackle complicated questions whose answers do not

296

reside on a single Web page. Suppose you wish to find the Ms. Cook you

297

met at a trade conference last year. You don't remember her first name,

298

but you remember that she worked for one of your clients and that her

299

son was a student at your alma mater. An intelligent search program can

300

sift through all the pages of people whose name is "Cook" (sidestepping

301

all the pages relating to cooks, cooking, the Cook Islands and so

302

forth), find the ones that mention working for a company that's on your

303

list of clients and follow links to Web pages of their children to

304

track down if any are in school at the right place.

1.1 Agents

The real power of the Semantic Web will be realized when people create many programs that collect Web content from diverse

309

sources, process the information and exchange the results with other

310

programs. The effectiveness of such software agents will increase

311

exponentially as more machine-readable Web content and automated

312

services (including other agents) become available. The Semantic Web

313

promotes this synergy: even agents that were not expressly designed to

314

work together can transfer data among themselves when the data come

315

with semantics.

316

317

An important facet of agents' functioning will be the exchange of

318

"proofs" written in the Semantic Web's unifying language (the language

319

that expresses logical inferences made using rules and information such

320

as those specified by ontologies). For example, suppose Ms. Cook's

321

contact information has been located by an online service, and to your

322

great surprise it places her in Johannesburg. Naturally, you want to

323

check this, so your computer asks the service for a proof of its

324

answer, which it promptly provides by translating its internal

325

reasoning into the Semantic Web's unifying language. An inference

326

engine in your computer readily verifies that this Ms. Cook indeed

327

matches the one you were seeking, and it can show you the relevant Web

328

pages if you still have doubts. Although they are still far from

329

plumbing the depths of the Semantic Web's potential, some programs can

330

already exchange proofs in this way, using the current preliminary

331

versions of the unifying language.

332

333

Another vital feature will be digital signatures, which are encrypted

334

blocks of data that computers and agents can use to verify that the

335

attached information has been provided by a specific trusted source.

336

You want to be quite sure that a statement sent to your accounting

337

338

program that you owe money to an online retailer is not a forgery

339

generated by the computer-savvy teenager next door. Agents should be

340

skeptical of assertions that they read on the Semantic Web until they

341

have checked the sources of information. (We wish more ~~people~~ would learn to do this on the Web as it is!)

342

343

Many automated Web-based services already exist without semantics, but

344

other programs such as agents have no way to locate one that will

345

perform a specific function. This process, called service discovery,

346

can happen only when there is a common language to describe a service

347

in a way that lets other agents "understand" both the function offered

348

and how to take advantage of it. Services and agents can advertise

349

their function by, for example, depositing such descriptions in

350

directories analogous to the Yellow Pages.

351

352

Some low-level service-discovery schemes are currently available, such

353

as Microsoft's Universal Plug and Play, which focuses on connecting

354

different types of devices, and Sun Microsystems's Jini, which aims to

355

connect services. These initiatives, however, attack the problem at a

356

structural or syntactic level and rely heavily on standardization of a

357

predetermined set of functionality descriptions. Standardization can

358

only go so far, because we can't anticipate all possible future needs.

----

Properly designed, the Semantic Web can assist the evolution of human knowledge as a whole.

----

The Semantic Web, in contrast, is more flexible. The consumer and

367

producer agents can reach a shared understanding by exchanging

368

ontologies, which provide the vocabulary needed for discussion. Agents

369

can even "bootstrap" new reasoning capabilities when they discover new

370

ontologies. Semantics also makes it easier to take advantage of a

371

service that only partially matches a request.

372

373

A typical process will involve the creation of a "value chain"

374

in which subassemblies of information are passed from one agent to

375

another, each one "adding value," to construct the final product

376

requested by the end user. Make no mistake: to create complicated value

377

chains automatically on demand, some agents will exploit

378

artificial-intelligence technologies in addition to the Semantic Web.

379

But the Semantic Web will provide the foundations and the framework to

380

make such technologies more feasible.

381

382

Putting all these features together results in the abilities

383

exhibited by Pete's and Lucy's agents in the scenario that opened this

384

article. Their agents would have delegated the task in piecemeal

385

fashion to other services and agents discovered through service

386

advertisements. For example, they could have used a ~~trusted~~ service to take a list of ~~providers~~ and determine which of them are ~~in-plan~~ for a specified ~~insurance plan~~ and ~~course of treatment~~.

387

The list of providers would have been supplied by another search

388

service, et cetera. These activities formed chains in which a large

389

amount of data distributed across the Web (and almost worthless in that

390

form) was progressively reduced to the small amount of data of high

391

value to Pete and Lucy a plan of appointments to fit their schedules

392

and other requirements.

393

394

In the next step, the Semantic Web will break out of the

395

virtual realm and extend into our physical world. URIs can point to

396

anything, including physical entities, which means we can use the RDF

397

language to describe devices such as cell phones and TVs. Such devices

398

can advertise their functionality what they can do and how they are

399

controlled much like software agents. Being much more flexible than

400

low-level schemes such as Universal Plug and Play, such a semantic

401

approach opens up a world of exciting possibilities.

402

403

For instance, what today is called home automation requires careful

404

configuration for appliances to work together. Semantic descriptions of

405

device capabilities and functionality will let us achieve such

406

automation with minimal human intervention. A trivial example occurs

407

when Pete answers his phone and the stereo sound is turned down.

408

Instead of having to program each specific appliance, he could program

409

such a function once and for all to cover every ~~local~~ device that advertises having a ~~volume control~~ the TV, the DVD player and even the media players on the laptop that he brought home from work this one evening.

410

411

The first concrete steps have already been taken in this area, with

412

work on developing a standard for describing functional capabilities of

413

devices (such as screen sizes) and user preferences. Built on RDF, this

414

standard is called Composite Capability/Preference Profile (CC/PP).

415

Initially it will let cell phones and other nonstandard Web clients

416

describe their characteristics so that Web content can be tailored for

417

them on the fly. Later, when we add the full versatility of languages

418

for handling ontologies and logic, devices could automatically seek out

419

and employ services and other devices for added information or

420

functionality. It is not hard to imagine your Web-enabled microwave

421

oven consulting the frozen-food manufacturer's Web site for optimal

422

cooking parameters.

423

424

1.1 Evolution of Knowledge

425

426

The semantic web is not "merely" the tool for conducting individual tasks that we have discussed so far.

427

In addition, if properly designed, the Semantic Web can assist the

428

evolution of human knowledge as a whole.

429

Human endeavor is caught in an eternal tension between the

430

effectiveness of small groups acting independently and the need to mesh

431

with the wider community. A small group can innovate rapidly and

432

efficiently, but this produces a subculture whose concepts are not

433

understood by others. Coordinating actions across a large group,

434

however, is painfully slow and takes an enormous amount of

435

communication. The world works across the spectrum between these

436

extremes, with a tendency to start small from the personal idea and

437

move toward a wider understanding over time.

438

439

An essential process is the joining together of subcultures

440

when a wider common language is needed. Often two groups independently

441

develop very similar concepts, and describing the relation between them

442

brings great benefits. Like a Finnish-English dictionary, or a

443

weights-and-measures conversion table, the relations allow

444

communication and collaboration even when the commonality of concept

445

has not (yet) led to a commonality of terms.

446

447

The Semantic Web, in naming every concept simply by a URI, lets

448

anyone express new concepts that they invent with minimal effort. Its

449

unifying logical language will enable these concepts to be

450

progressively linked into a universal Web. This structure will open up

451

the knowledge and workings of humankind to meaningful analysis by

452

software agents, providing a new class of tools by which we can live,

453

work and learn together.

----

*Further Information:*

458

459

*Weaving the Web: The Original Design and Ultimate Destiny of the World Wide Web by Its Inventor.* \\

460

Tim Berners-Lee, with Mark Fischetti. Harper San Francisco, 1999.\\

461

An enhanced version of this article is on the Scientific American Web site, with additional material and links.

462

463

World Wide Web Consortium (W3C): [www.w3.org/>http://www.w3.org/>_blank]

464

465

W3C Semantic Web Activity: [www.w3.org/2001/sw/>http://www.w3.org/2001/sw/>_blank]

466

467

An introduction to ontologies: [www.semanticweb.org/knowmarkup.html>http://www.semanticweb.org/knowmarkup.html>_blank]

468

469

Simple HTML Ontology Extensions Frequently Asked Questions (SHOE FAQ): [www.cs.umd.edu/projects/plus/SHOE/faq.html>http://www.cs.umd.edu/projects/plus/SHOE/faq.html>_blank]

470

471

DARPA Agent Markup Language (DAML) home page: [www.daml.org/>http://www.daml.org/>_blank]

Wiki source code of SwSemanticWeb

Navigation

Wiki source code of SwSemanticWeb

Navigation

Tag Cloud