Transcription

Ontology Development 101: A Guide to Creating YourFirst OntologyNatalya F. Noy and Deborah L. McGuinnessStanford University, Stanford, CA, [email protected] and [email protected] develop an ontology?In recent years the development of ontologies—explicit formal specifications of the terms inthe domain and relations among them (Gruber 1993)—has been moving from the realm ofArtificial-Intelligence laboratories to the desktops of domain experts. Ontologies havebecome common on the World-Wide Web. The ontologies on the Web range from largetaxonomies categorizing Web sites (such as on Yahoo!) to categorizations of products forsale and their features (such as on Amazon.com). The WWW Consortium (W3C) isdeveloping the Resource Description Framework (Brickley and Guha 1999), a language forencoding knowledge on Web pages to make it understandable to electronic agents searchingfor information.The Defense Advanced Research Projects Agency (DARPA), inconjunction with the W3C, is developing DARPA Agent Markup Language (DAML) byextending RDF with more expressive constructs aimed at facilitating agent interaction on theWeb (Hendler and McGuinness 2000). Many disciplines now develop standardized ontologiesthat domain experts can use to share and annotate information in their fields. Medicine, forexample, has produced large, standardized, structured vocabularies such as SNOMED (Price andSpackman 2000) and the semantic network of the Unified Medical Language System(Humphreys and Lindberg 1993). Broad general-purpose ontologies are emerging as well. Forexample, the United Nations Development Program and Dun & Bradstreet combined theirefforts to develop the UNSPSC ontology which provides terminology for products andservices (www.unspsc.org).An ontology defines a common vocabulary for researchers who need to share information ina domain. It includes machine-interpretable definitions of basic concepts in the domain andrelations among them.Why would someone want to develop an ontology? Some of the reasons are: To share common understanding of the structure of information among people orsoftware agents To enable reuse of domain knowledge To make domain assumptions explicit To separate domain knowledge from the operational knowledge To analyze domain knowledgeSharing common understanding of the structure of information among people or softwareagents is one of the more common goals in developing ontologies (Musen 1992; Gruber1993). For example, suppose several different Web sites contain medical information orprovide medical e-commerce services. If these Web sites share and publish the sameunderlying ontology of the terms they all use, then computer agents can extract andaggregate information from these different sites. The agents can use this aggregatedinformation to answer user queries or as input data to other applications.Enabling reuse of domain knowledge was one of the driving forces behind recent surge inontology research. For example, models for many different domains need to represent thenotion of time. This representation includes the notions of time intervals, points in time,relative measures of time, and so on. If one group of researchers develops such an ontologyin detail, others can simply reuse it for their domains. Additionally, if we need to build a large1

ontology, we can integrate several existing ontologies describing portions of the largedomain. We can also reuse a general ontology, such as the UNSPSC ontology, and extend itto describe our domain of interest.Making explicit domain assumptions underlying an implementation makes it possible t ochange these assumptions easily if our knowledge about the domain changes. Hard-codingassumptions about the world in programming-language code makes these assumptions notonly hard to find and understand but also hard to change, in particular for someone withoutprogramming expertise. In addition, explicit specifications of domain knowledge are usefulfor new users who must learn what terms in the domain mean.Separating the domain knowledge from the operational knowledge is another common useof ontologies. We can describe a task of configuring a product from its components accordingto a required specification and implement a program that does this configuration independentof the products and components themselves (McGuinness and Wright 1998). We can thendevelop an ontology of PC-components and characteristics and apply the algorithm t oconfigure made-to-order PCs. We can also use the same algorithm to configure elevators ifwe “feed” an elevator component ontology to it (Rothenfluh et al. 1996).Analyzing domain knowledge is possible once a declarative specification of the terms isavailable. Formal analysis of terms is extremely valuable when both attempting to reuseexisting ontologies and extending them (McGuinness et al. 2000).Often an ontology of the domain is not a goal in itself. Developing an ontology is akin t odefining a set of data and their structure for other programs to use. Problem-solving methods,domain-independent applications, and software agents use ontologies and knowledge basesbuilt from ontologies as data. For example, in this paper we develop an ontology of wine andfood and appropriate combinations of wine with meals. This ontology can then be used as abasis for some applications in a suite of restaurant-managing tools: One application couldcreate wine suggestions for the menu of the day or answer queries of waiters and customers.Another application could analyze an inventory list of a wine cellar and suggest which winecategories to expand and which particular wines to purchase for upcoming menus orcookbooks.About this guideWe build on our experience using Protégé-2000 (Protege 2000), Ontolingua (Ontolingua1997), and Chimaera (Chimaera 2000) as ontology-editing environments. In this guide, weuse Protégé-2000 for our examples.The wine and food example that we use throughout this guide is loosely based on an exampleknowledge base presented in a paper describing CLASSIC—a knowledge-representationsystem based on a description-logics approach (Brachman et al. 1991). The CLASSIC tutorial(McGuinness et al. 1994) has developed this example further. Protégé-2000 and otherframe-based systems describe ontologies declaratively, stating explicitly what the classhierarchy is and to which classes individuals belong.Some ontology-design ideas in this guide originated from the literature on object-orienteddesign (Rumbaugh et al. 1991; Booch et al. 1997). However, ontology development isdifferent from designing classes and relations in object-oriented programming. Objectoriented programming centers primarily around methods on classes—a programmer makesdesign decisions based on the operational properties of a class, whereas an ontology designermakes these decisions based on the structural properties of a class. As a result, a classstructure and relations among classes in an ontology are different from the structure for asimilar domain in an object-oriented program.It is impossible to cover all the issues that an ontology developer may need to grapple withand we are not trying to address all of them in this guide. Instead, we try to provide a startingpoint; an initial guide that would help a new ontology designer to develop ontologies. At the2

end, we suggest places to look for explanations of more complicated structures and designmechanisms if the domain requires them.Finally, there is no single correct ontology-design methodology and we did not attempt t odefine one. The ideas that we present here are the ones that we found useful in our ownontology-development experience. At the end of this guide we suggest a list of references foralternative methodologies.2What is in an ontology?The Artificial-Intelligence literature contains many definitions of an ontology; many ofthese contradict one another. For the purposes of this guide an ontology is a formal explicitdescription of concepts in a domain of discourse (classes (sometimes called concepts)),properties of each concept describing various features and attributes of the concept (slots(sometimes called roles or properties)), and restrictions on slots (facets (sometimes calledrole restrictions)). An ontology together with a set of individual instances of classesconstitutes a knowledge base. In reality, there is a fine line where the ontology ends andthe knowledge base begins.Classes are the focus of most ontologies. Classes describe concepts in the domain. Forexample, a class of wines represents all wines. Specific wines are instances of this class. TheBordeaux wine in the glass in front of you while you read this document is an instance of theclass of Bordeaux wines. A class can have subclasses that represent concepts that are morespecific than the superclass. For example, we can divide the class of all wines into red, white,and rosé wines. Alternatively, we can divide a class of all wines into sparkling and nonsparkling wines.Slots describe properties of classes and instances: Château Lafite RothschildPauillac wine has a full body; it is produced by the Château Lafite Rothschildwinery. We have two slots describing the wine in this example: the slot body with the valuefull and the slot maker with the value Château Lafite Rothschild winery. At theclass level, we can say that instances of the class Wine will have slots describing theirflavor, body, sugar level, the maker of the wine and so on.1All instances of the class Wine, and its subclass Pauillac, have a slot maker the value ofwhich is an instance of the class Winery (Figure 1). All instances of the class Winery have aslot produces that refers to all the wines (instances of the class Wine and its subclasses)that the winery produces.In practical terms, developing an ontology includes: defining classes in the ontology, arranging the classes in a taxonomic (subclass–superclass) hierarchy, defining slots and describing allowed values for these slots, filling in the values for slots for instances.We can then create a knowledge base by defining individual instances of these classes filling inspecific slot value information and additional slot restrictions.1We capitalize class names and start slot names with low-case letters. We also use typewriter font forall terms from the example ontology.3

Figure 1. Some classes, instances, and relations among them in the wine domain. We used blackfor classes and red for instances. Direct links represent slots and internal links such as instance-ofand subclass-of.3A Simple Knowledge-Engineering MethodologyAs we said earlier, there is no one “correct” way or methodology for developing ontologies.Here we discuss general issues to consider and offer one possible process for developing anontology. We describe an iterative approach to ontology development: we start with a roughfirst pass at the ontology. We then revise and refine the evolving ontology and fill in thedetails. Along the way, we discuss the modeling decisions that a designer needs to make, aswell as the pros, cons, and implications of different solutions.First, we would like to emphasize some fundamental rules in ontology design to which we willrefer many times. These rules may seem rather dogmatic. They can help, however, to makedesign decisions in many cases.1) There is no one correct way to model a domain— there are alwaysviable alternatives. The best solution almost always depends on theapplication that you have in mind and the extensions that you anticipate.2) Ontology development is necessarily an iterative process.3) Concepts in the ontology should be close to objects (physical or logical)and relationships in your domain of interest. These are most likely to benouns (objects) or verbs (relationships) in sentences that describe yourdomain.That is, deciding what we are going to use the ontology for, and how detailed or general theontology is going to be will guide many of the modeling decisions down the road. Amongseveral viable alternatives, we will need to determine which one would work better for theprojected task, be more intuitive, more extensible, and more maintainable. We also need t oremember that an ontology is a model of reality of the world and the concepts in theontology must reflect this reality. After we define an initial version of the ontology, we canevaluate and debug it by using it in applications or problem-solving methods or by discussingit with experts in the field, or both. As a result, we will almost certainly need to revise theinitial ontology. This process of iterative design will likely continue through the entirelifecycle of the ontology.4

Step 1. Determine the domain and scope of the ontologyWe suggest starting the development of an ontology by defining its domain and scope. Thatis, answer several basic questions: What is the domain that the ontology will cover? For what we are going to use the ontology? For what types of questions the information in the ontology should provide answers? Who will use and maintain the ontology?The answers to these questions may change during the ontology-design process, but at anygiven time they help limit the scope of the model.Consider the ontology of wine and food that we introduced earlier. Representation of foodand wines is the domain of the ontology. We plan to use this ontology for the applicationsthat suggest good combinations of wines and food.Naturally, the concepts describing different types of wines, main food types, the notion of agood combination of wine and food and a bad combination will figure into our ontology. Atthe same time, it is unlikely that the ontology will include concepts for managing inventoryin a winery or employees in a restaurant even though these concepts are somewhat related t othe notions of wine and food.If the ontology we are designing will be used to assist in natural language processing of articlesin wine magazines, it may be important to include synonyms and par