Stanford Computer Science DepartmentReport No. STAN-CS-80-790DATABASES IN HEALTHCAREbYGio WiederholdResearch sponsored byNational Institutes of HealthCOMPUTER SCIENCE DEPARTMENTStanford UniversityMarch 1980

DATABASES IN HEALTHCAREGio WiederholdStanford UniversityComputer Science DepartmentMarch 1, 1979CS Report STAN CS80-790This paper was prepared for a Compendium on Computersin Health Care, Dr. D.A.B. Lindberg, Univ. of Missouri,Editor. The resources provided by the SUMEX Facility,supported by grant NIH RR-88785 were essential to thepreparation of this document. Some of the work is basedon research carried out as part of the RX project,sponsored by the National Center for Health ServicesResearch under grant no. lR03HS93658. Basic research inDatabase Design leading to some of the concepts presentedhere is supported by ARPA through the KBMS project atStanford University.

DATABASES IN HEALTHCAREABSTRACTThis report defines database design and implementationtechnology as applicable to healthcare. The relationship oftechnology to various healthcare settings is explored, andthe effectiveness on healthcare costs, quality and access isevaluated. A summary of relevant development directions isincluded.Detailed examples of 5 typical applications (public health,clinical trials, clinical research, ambulatory care, and hospitals)are appended. There is an extended bibliography.

.CONTENTSI.Definition of the TechnologyA.Databa’sesB.Terminology in the Area of Data Bases.1.2.3.c.The schemaThe data modelTypes of data modelsDatabase1. of databasesFile management systems versus database management systemsRelated SystemsScientific Basis for Database Technology1.2.3.D.and Their ObjectivesOperationEntering data into the databaseData storageData organization for retrievalData presentationDatabase administrationUse of Databases in Health CareA.Health Care Settings and the Relevancy of Database Technology1. Health Care Applications of Databases1. solo practiceGroup practiceSpecialty practiceHospitalsClinical researchNon-patient databasesReimbursement databasesDisease-specific shared databasesDatabases used in HMO'sSurveillance databasesSpeciality clinical databasesGeneral clinical databasesDatabases in researchThe Future Use of Databases in Health Care:: issuesInitiatives and innovation due to technology pushThe human elementSharing of informationPrivacy in DatabasesMissing dataProblems of current 22222224252627283838i!323334340. The Effect of Databases on Health Care Costs, Quality, and Access 36,

III.State of the Art of Database Technology in Health CareA. Systems in Research or Development Status38B. Industrial Status391.2.39Medical database systemsGeneral database systems that are applicable to health care 48C. Current Directions of DevelopmentIV.V.3841Appendix : Examples of Current Databases in Health Care42A. Public Health :CCPDS at the Fred Hutchinson Cancer Center in Seattle42B. Randomized Controlled Clinical Trials :ECOG and RTOG at the Harvard Univ. Dept. of Public Health45C. Clinical Research :TOD and ARAMIS at Stanford University480. HMO Support:COSTAR and the Harvard Community Health Service52E. Hospital Systems :POWS at Coral Gables Variety Childrens54References63

wuw 4DATABASES IN HEALTH CARErrr-- rrrr--rrrrrrlrrrrrI.DEFINITION OF THE TECHNOLOGY .In this chapter we will introduce the concepts of database technology in away that will make it easy to relate the terminology to problems in healthcare. After the objectives have been defined the major components ofdatabases and their function will be discussed. The remainder of thischapter will present the scientific and the operational issues associatedwith databases.LADatabases and Their ObjectivesA database is a collection of related data, which are organized so thatuseful information may be extracted. The effectiveness of databasesderives from the fact that from one single, comprehensive database much ofthe information relevant to a variety of organizational purposes may beobtained.In health care the same database may be used by medIcapersonnel for patient care recording, for surveillance of patient status,and for treatment advice; it may be used by researchers in assessing theeffectiveness of drugs and clinical procedures; and it can be used byadministrative personnel in cost accounting and by management for theplanning of service facilities.The fact that data are shared promotes consistency of information fordecision-making and duplicate data collection. A major benefitof databases in health care is due to the application of the informationto the management of services and the allocation of resources needed forthose services, but communication through the shared information amonghealth care providers, and the validation of medical care hypotheses fromobservations on patients are also significant.The contents and the description of a database has to be carefully managedin order to provide for this wide range of services, so that some degreeof formal data management is implied when we speak of databases. Theformalization, and the large data quantity implied in effective databaseoperations make computerization of the database function essential; in fact,much of the incentive for early [Bush451 and current computing technology[Barsam79) is due to the demands made by information processing needs.Hence, the notion of a database encompasses the data themselves, the hardwareused to store the data, and the software used to manipulate the data. Whenthe database is used for multiple purposes we find also an administrationwhich controls and assigns the resources needed to maintain the datacollection and permit the generation of information.We will in the next section define the technical scope of databases. Theremaining sections in this chapter will deal specifically with current andfuture applications of databases in health care.

I DefinitionsLBpage 5Terminology in the Area of Data Bases.Within the scope of databases are a number of concepts, which areeasily confused with each other. The objective of a database is to provideinformation, but not all systems that provide information are databases.We will first define the term 'database', and then some terms that describeaspects of database technology. In the section which follows we will presenttypes of systems which are related or similar to databases, but are notconslderd databases within thls review.A database is a collection of related data,with facilities that process these data toyield information.A database system facilitates the collection, organization, storage, sand processing of data. The processing of data from many sources canprovide information that would not have been available before the datawere combined into a database. Hence, a collection of data is not byitself a database, a system that supports data storage is not neccessarilya database system, and not all the information provided by computersystems is produced from databases.I.B.lComponents of databasesA database is hence composed both of data, and of programs or software toenter and manipulate the data. Both data and software are stored within thecomputers which support the database, and the internal organization maynot be obvious to the users. We will now describe some of the componentsthat are part of database software. Databases require the availability ofcertain technological tools, or software subsystems. Some of these tools,that are used to support databases can also be used independently,and hence they are at times confused with the database system itself.Important subsystems are:a) File Storage Systems : software to allocate and manage space.for data kept on large computer storage devices, such as disks ortapes.b) File Access Methods : software to rapidly access and updatedata stored on those devices.c) Data Description Languages : means to describe data so thatusers and machines can refer to data elements and aggregationsof similar data elements conveniently and unambigously.d) Data Manipulation Languages : programs to allow the user toretrieve and process data conveniently.In a database these subsystems have to be well integrated, so that thedata manipulation can 'be carried out in response to the vocabulary usedin the data descriptions. Storage is allocated and rearranged as new dataenter the database, and access to old and new data is provided as neededfor manipulation. To provide the neccessary reliability some redundantbackup data is stored separatly and appropriatly identified whenever thedatabase is changed. Optional software components of a database mayprovlde on-line, conversational access to the database, help with theformulation of statistical queries, and provide printed reports on a regularschedule.

IA.2File Management Systems versus Database Management SystemsOf primary concern to a database effort is the reliable operation of thedevices used to store the data over long periods of time. The programmingsystems which provide such services, typically inclusive of the tools listedin a) and b) above, are called file management systems (FMS).When data are to be organized so that they can be accessed by a variety ofusers, system control extendlng to the individual users, and to the apacifc data units which these users will be referencing, may be needed. Controlover the date and its use can only be achieved if all users access thedatabase always via programs that will protect the reliability, privacy, andintegrity of the database. We achieve reliability when data are not lost dueto hardware and software errors. We protect privacy wh.en we guarantee thatonly authorized access will occur. We define integrity as freedom fromerrors that could be introduced by simultaneous use of the database by usersthat may update its contents. A database management system (DBMS) shouldprovide all the required database support programs, including management offiles, scheduling of user programs, database manipulation, and recovery fromerrors.All these should form a well integrated package.Not every database is managed by a commercial DBMS. Database support canalso be provided by programs that use one of the available file managementsystems.The.contents of the database can be identical for a system usinga generalized DBMS product or one using programs written specifically forthe task. A locally developed collection of programs rarely has the all ofthe protective features that are desirable when multiple users interact withthe database from terminals. The manner in which users gain access willalways depend on the choice of the DBMS or the file management system.For instance, a file system does not provide automatic scheduling of userrequestedactivities.Without a DBMS the users will have to schedule theirown activities in such a way that simultaneous data entry is avoided. Somefile systems will simply disallow such access, in other systems such usagecould lead to inconsistent data. If data entry activities are organized sothat such conflicts are avoided then there is less need for the complexityof a DBMS. A very popular file management system in medicine is MUMPS,developed at Massachusetts General Hospital to support clinical use ofrelatively small computer systems [Bowie77].-Both file management systems (FMS) and database management systems (DBMS)are available commercially for most computers. Some DBMS's will make useof an existing FMS, others will perform all but the most primitive fileaccess functions themselves. Since a DBMS interacts closely with the userof the database, we find that distinct types of DBMS's have been developed.DBMS's also differ in terms of the comprehensiveness of software services.Most manufacturers provide an FMS at no additional cost, but acquisition- of a DBMS is rarely free.The choice of a particular type of database management system will influencethe structure of the future database. Not every type of DBMS will beavailable on a given computer, but for most medium to large computers thereis some choice. Simplicity versus generality and cost are often a trade-off.Even so-called generalized database management systems impose, to a greatextent, the view of the designer or sponsor of such a DBMS. Many of themajor systems now being marketed were designed to solve the complexitiesof specific applications.We hence find DBMS's that excel in inventorymanagement, some do excellent retrieval of bibliographic citations, others.

1.BDefinitions.Terminologypaw 7have a strong bias towards statistical processing. Even within the medicalarea different DBMS's will emphasize one of the many objectives that arefound within the range from patient care to medical research. The followingtable wlll list some database systems found in medicine with an lndlcatlonof their objective. We distinguish in this table: general ambulatory patientcare, clinical or speciality outpatient care, hospital Inpatient care,,orpatient management and record keeping in these areas. Clinical studies refersto research data collection on defined populations. Guidance refers to thegiving of medical advice during the inquiry process. Details of these tyesof application are given in chapter 2 of this review. The types of databaseorganizations can be categorized as tabular, relational, hierarchlal, ornetwork. These terms wlll be defined in section 8.4 of this ------------------------------: Computers : Reference11 Name : Application :: FMS ------------------------------1 Kronma78 1{ ccss1 Clin.Studiesl Tab. DBMS 1 Seq. filesI variousIIIIIIIIIIIIIIf McDona77 11 CIS1 Med.GuidanceJ Tab. DBMS I DEC seq.files) DECllIIi‘IIIIf DG Eclipse 1 Groner78 11 CLINFO 1 Clin.StudiesJ Tab. DBMS 1 DG ISAM1 DEC 111 COSTAR 1 Amb.Pat.Rec.1 Hier.DBMS 1 MUMPS1IIII1 GEMISCHI CliniRecs.1 Hier.DBMS 1 DEC-11 DOS 1 DEC 11t. cIII1 IBM 3761 Clin.Studiesi Tab. DBMS 1 IBM VSAMI GMDB1 FAME‘III)I IMS1 Hosp.Recs.IIII1 LIMI-I.Pat.Mgmt.I’It1ItI1II1.IIItIIIII1 CDC Cyber 1 Brown78 1Hier.DBMS 1 CDC SISi Netw.DBMS I Basic Access I IBM 378I Multi-hie-1 IBM VSAM1 rarch.DBMSl'1 Regional Reel Hier.DBMS 1 IBM DL/l(IMS)lII1 Wirtsc78 1II1 Clin.Studiesi1 IDMSf Barnet7g I1 Hammon78 1I1 Penick75 11 IBM 3701 Sauter76 1IIIBM 360I'1IIJaint761 MIDAS 1 Regional Reel Hier. FMS 1 Direct files 1 Univac494 1 Fenna78IIIIII1. DEC 15I MISAR I Res.Data1 Tab. DBMS 1 MUMPSIIII Karpin711Ij DEC 15,ll 1 Barnet1 MUMPS 1 Med.Records [ Hier. FMS 1 self.cont.1 DG and morel Bowie77III1ff,1 DEC 11-70 1 Blum791 OCIS1 Clin.Recs.1 Mult.Hier.1 MUMPS-11t*IIIII1 Wieder751 TOD1 Clin.Recs.1 Tab. DBMS 1 PL/l ind.seq.1 IBM 378-IIIIII’ IIII PROMIS 1 Hosp.Recs.& 1 Tab. FMS 1 direct files 1 Univac1 Med.GuidancelIII V77-688II RISS1,I Hosp.Recs.I Relat.DBMSf RTSll ind.seq) DEC 11IIII1II11I1I1I1 Schult79 1iIIII Me1dma78 IIIII1----------1---1-------.-------. se and File Management Systems Found in Health Care

I.B.3RelatedsystemsData are collected and stored into a database with the expectation that at alater time the data can be analyzed, conclusions can be drawn, and that theinformation obtained can be used to influence future actions. Information isgenerated from data through processing, and should increase the knowledge ofthe receiver of this information. This person then should have the means toact upon the information, perhaps to the benefit of a larger community.11 The production of information is the central objective of a database. 11There are other automated information processing systems which are notconsiderd databases, although they may share some of the technology.In theremainder of this section two categories of such related systems will bepresented.INFORMATION SYSTEMS store information - often the output of earlier dataanalyses - for rapid selective retrieval [Beckle77]. A well known exampleis the MEDLARS system [Katter'lfi, Leiter771, a service of the National Libraryof Medicine, which provides access to papers published in the medicalliterature.The task of such an information system is the selection andretrieval of information, but not the generation of information [Lucas78].Index Medicus-for instance only provides the references, and depends on theuser's own library [Kunz79], Even maintenance of personal reference filescan be effectively automated [ReicheC8].The benefits are due to the speedand improved coverage with which the documents can be found.The boundary between information systems and database systems is not at allabsolute.One can perhaps even speak of a spectrum of system types. Whenthe queries are simple the two system types are in fact indistinguishable.Retrieval of the age of a patient, for instance, can be carried out withequal facility on either type of system. But when another observation, saycholesterol level, has to be compared with the.average cholesterol level for. aall other patients of the same age, then a computation to generate thisinformation is needed, and a system which is able to do this is placed moreon the database side of the spectrum.-DECISION SUPPORT SYSTEMS assist with the manipulation of data supplied bythe user [Davis78).The help may be principally algorithmic - perhapsassuring that Bayes' rule is properly applied. More specialized systemsembody medical knowledge [Johnso79], for instance in acid-base balanceassessment [Bleich72] and anti-microbial therapy [Yu79). While these systemscould be coupled to databases, so that they become also knowledgeable abouta specific patient, today they are typically separate [Gabrie78]. Work inin decision making for health care cost control has indicated a need for* database facilities in these applications [BrookW76].The HELP system, at the LDS hospital in Salt Lake City, does keep a separatefile of clinical decision criteria and applies them to the patient databaseas it is updated. The system then advises the physician to consider certainactions or further diagnostic tests [Warner78]. As medical databases becomemore reliable and comprehensive we can envisage increased exploitation of theinformation contained in them by systems which embody medical knowledge.

I Definitions1.cpaw 9The Scientific Basis for Database TechnologyThe emergence of databases is not so much due to particular inventions, butis a logical step in the natural development of computing technology. Theevolution of computational power began with the achievement of adequatereachedreliability of complex electronic devices, The mean-time-to-failureAt that point the concernsseveral hours for powerful computers about- 1955.movadSto the development of programming languages, so that programs ofreasonable power could be written, These programs had the capacity toprocess large quantities of data, and in the early sixties magnetic tape anddisk devices were developed to make the data available, Operating systems toallocate storage and processing power to the programs became the nextchallenge. By the late sixtles these systems had matured so that multi-useroperation became the norm. As these foundations were laid it became feasibleto keep data available on-line, Le., directly accessible by the computersystem without manual intervention, like fetching and mounting computertapes. Now a variety of appljcation programs c8n use those data 8s needed.In current systems valuable data c8n be kept on-line over long periodswithout fear of loss or damage to the datebase.I.C.1 The SchemaThe one technical concept which is central to database management systemsis the schema. A schema is a formalized description of the data that arecontained in the database, evailable to the programs that wish to use thedata. All data kept in such 8 database is tdentlfied with 8 name, say DOBfor date-of-birth. With a schema it is sufficient for application programs. to specify the name of the data they wish to retrdeve. A command may state:datetoftbirth GET ( currenttpatient, DOB ) ;The database system will use the schema to match the name of the requesteddata. When a corresponding entry in the schema is found, the database systemcan use information associated with the entry to determine where therequested data have been stored, locate the data values, and retrieve theminto the application program area ( date of birth) for analysis or display.During this process it is possible to check that the requestor is authorizedto access the data. The DBMS may also have to change the data into arepresentation that the program can handle [Feins78AJ. Similar processesare carried out by the DBMS when old data are to be updated and when new- data are to be added to the database.The schema is established before any data can be placed into the databaseand embodies all the decisions that have been made about the contents andthe structure of the database. Each individual type of data element willreceive a reference name. The data to be kept under this name may be furtherdefined. The most important specification is whether the data are numeric,a character string, or a code. Codes then need tables or programs for theirdefinition. Other schema entries give the format and length of the dataelement, and perhaps the range of acceptable values. For observations ofbody temperature the-five descriptors might be:TEMP, temperature in degrees C, numeric, Xx.X, 36.8 to 44.8.The data elements so described will have to fit into a structure; 8 value byitself, say TEMP 41.9, is of course meaningless. This data element belongs

page 18in an observation record, and the observation record must contain otherdata elements, namely a patient identification (ID), a date, and a time.These data elements, which are used to identify the entity described in therecord, constitute the ruling part; without these there is insufficientinformation present to make the TEMPerature observation useful. The rulingpart data types { ID, DATE, TIME ) will also appear in the schema.The observation record may contain, in addition to TEMP, other dependentdata elements as: the pulse rate, the'blood pressure nd .the name of theobserver.The entire observation record can then be described as a list ofseven attributes, as follows:Observations:ID, DATE, TIME TEMP, PULSE, BP, OBSERVER;The first three attributes form the ruling part, the other four are thedependent part; we seperate the two parts with a symbol. Each attributehas associated with it a schema entry with the five descriptors shown forthe TEMP entry above. There will be other kinds of records in the database:a patient demographic data record will exist in most databases we consider.Here the only data element in the ruling part will be the ID field. Thisrecord may be in part as below:Patients:ID PATIENT-NAME, ADDRESS, DOB, SEX, . . . ;Matching of the ID fields establishes the relationship between patientdemographic data records and the observation records. The known relationshipsbetween record types should also be described in the schema, so that the useof the schema is simplified [Manach75, Chang076J.We use three types of connections to describe relationships between records[Wieder79), their use is also sketched in the figure below.a) The Identity Connection - used where the ruling parts are similar,but different groupings are described;for instance both hospital patients and diabetes clinic patientsare patients with patient ID's, but may have different dependentdata stored in their files.b) The Reference Connection - used where there is a common descriptiverecord referred to by multiple data records;for instance the physician seen, is a record type referred tofrom the patients clinic visit records.c) The Nest Connection - used where there are many subsidiary records ofsome type which depend on a higher level record;.multiple nest connections define an association;for instance the multiple clinic visits of a specific patient,each with data on his temperature, blood pressure, etc. forma nest of the patient record.An association occurs in the figure where a physician hasadmitting privileges at one or more hospitals, and each hospitalgrants admitting privileges to a number of physicians. Theadmitting-privileges file has as ruling part both the physician'sID and the hospital name, a dependent data element might be thedate the privilege was granted.Associated with the connection types may be rules for the maintenance of

1.CDefinitions.Scientific Basispage 11database integrity. Such rules can inform the database system that certainupdate operations are not permissible, since they would make the databaseinconsistent. For example we would not want to add a clinic patient withoutadding a corresponding record to the general patient file, if the patientdid not yet exist there. Similarily deletion of a physicians record fromthe database implies deletion of the associated admitting privileges.I.C.2The Data ModelIn order to provide guidance for the creator of the schema it is importantto have design tools. A large database can contain many types of records,and even more relationships between the record types. These have to beunderstood and used by a variety of people: the programmers who devise dataentry and analysis programs, the researchers who wish to explore the databasein order to formulate or verify new hypotheses, and the planners who wish touse the data as basis for modelling so that they can predict the responseto future actions. A variety of models exist [ACM76]; some models areabstractions of the facilities that certain types of database managementsystems can provide, other use more generalized, mathematical abstractionsto represent the data and their relationships. Recent work in databaseresearch is directed towards improving the representation of the semanticsof the data [Hammer78, ElMasr79, Codd79) so that the constraints of therelationships that exist in the real world can be used to verify theappropriateness of data that are entered into the database.Any reasonable model of the database can provide a common ground forcommunication between users and implementors, without a model there is aptAn example of a data model for ato be an excess of detail [Wieder78).clinical database is shown below.-I-------------------------I Patients 1 i Clinic Patients 1--1-"----m---I-I--LIII LI-LI-----------Iseen .-- I Physician II-----I----*/----1---------- /II Clinic Visits II-----------I-I Pharmacopeia I-----------I-------------1--IlIlII *-----------o------*--------II Hospital I-----o----1II*i II ------1 Admitting Privilege I- --- - ------------I Drugs Prescribed I------I----------The nest ( 0-i ) connections indicate that there may be multipleinferior instances for each superior instance.The reference ( -- ) connections indicate that there may bemultiple references to each instance.The identity ( ) connection defines a subgroup.

page 12I.C.3 Types of Database ModelsA popular approach to database analysis distinguishes several categoriesof databases. Database system implementations can be associatedwith each category. These categories are represented by databasemodel types, the best known types are theRelational model- derived from the mathematical theory ofrelations and sets.Hierarchical model- related to tree-shaped database implementations,similar to corporate organization diagrams.Network model - permits interconnections that are more complexthan hierarchies, based on a definition developed by acommittee of specialists in commercial system languages.The structural model can describe the structures of any of these threemodels, as well as of other database implementations. If only a singlerecord-type - a box in the above diagram - is implemented then we aredealing with a 'universal relation' [Ullman79]. A single box for acomplex database would have many columns and rows, and contain many nullentries. If the data are organized into several record-types, eachcorresponding to some meaningful entity, then we are dealing with a'tabular database'; if a completely general query and processing capabilityexists in such a system, we have implemented the 'relational model'[Codd70].At this point the entities stand alone, and some analysis is needed torelate them. If any of the indicated connections have been implemented thenwe may have a network or a hierarchical database. In the hierarchical modela record-type may havg only one nest connection ( --* ) pointing to it. Theimplementatlon of multiple nest connections, which creates a network withassociations, is considerably more complex [Stoneb'lS, Wieder773. Several ofthe larger commercial DBMS's are based on work by the Data Base Task Groupof CODASYL, and do support such network structures [Olle78). These systemsoften do not support the general inquiry capability of the relational modelimplementations. *It is important to note that there is a distinction between a model and itsimplementation.A model is an abstraction and provides a level of insightwhich can cut through masses of confusing detail. In the implementationthis detail has to be considered. It is likely that the implementation willdiffer considerably from the model used to describe it. As more powerfulmodels are developed this distinction may become greater. An implementationmay then be best described in terms of transformations that are applied tothe model which defines the database at a high conceptual level. Mosttransformations are done for reasons of operational performance andreliability.

I Definitions1.0Databasepage 13OperationIn the section above we have discussed the scientific basis of databases.In order to use and benefit from that science a database operation has tobe established, and that involves many decisions of practical, but criticalimportance. This section will consider such topics.When the database design

Stanford University Computer Science Department March 1, 1979 CS Report STAN CS80-790 This paper was prepared for a Compendium on Computers in Health Care, Dr. D.A.B. Lindberg, Univ. of Missouri, Editor. The resources provided by the SUMEX Facility, supported by grant NIH RR-88785 were essen