Journal of the American Medical Informatics Association, 27(5), 2020, 738–746doi: 10.1093/jamia/ocaa030Research and ApplicationsThe new International Classification of Diseases 11thedition: a comparative analysis with ICD-10 andICD-10-CMKin Wah Fung, Julia Xu, and Olivier BodenreiderLister Hill National Center for Biomedical Communications, National Library of Medicine, National Institutes of Health, Bethesda,Maryland, USACorresponding Author: Kin Wah Fung, MD, MS, MA, Lister Hill National Center for Biomedical Communications, Building38A, Rm9S918, MSC-3826, National Library of Medicine, 8600 Rockville Pike, Bethesda, MD 20894, USA ([email protected])Received 19 December 2019; Revised 28 January 2020; Editorial Decision 1 March 2020; Accepted 9 March 2020ABSTRACTObjective: To study the newly adopted International Classification of Diseases 11th revision (ICD-11) and compare it to the International Classification of Diseases 10th revision (ICD-10) and International Classification ofDiseases 10th revision-Clinical Modification (ICD-10-CM).Materials and Methods: : Data files and maps were downloaded from the World Health Organization (WHO)website and through the application programming interfaces. A round trip method based on the WHO mapswas used to identify equivalent codes between ICD-10 and ICD-11, which were validated by limited manual review. ICD-11 terms were mapped to ICD-10-CM through normalized lexical mapping. ICD-10-CM codes in 6 disease areas were also manually recoded in ICD-11.Results: Excluding the chapters for traditional medicine, functioning assessment, and extension codes for postcoordination, ICD-11 has 14 622 leaf codes (codes that can be used in coding) compared to ICD-10 and ICD-10CM, which has 10 607 and 71 932 leaf codes, respectively. We identified 4037 pairs of ICD-10 and ICD-11 codesthat were equivalent (estimated accuracy of 96%) by our round trip method. Lexical matching between ICD-11and ICD-10-CM identified 4059 pairs of possibly equivalent codes. Manual recoding showed that 60% of a sample of 388 ICD-10-CM codes could be fully represented in ICD-11 by precoordinated codes or postcoordination.Conclusion: In ICD-11, there is a moderate increase in the number of codes over ICD-10. With postcoordination,it is possible to fully represent the meaning of a high proportion of ICD-10-CM codes, especially with the addition of a limited number of extension codes.Key words: ICD-11, ICD-10, ICD-10-CM, controlled medical vocabularies, medical terminologiesINTRODUCTIONThe International Classification of Diseases (ICD) can be traced backover a century ago to the International List of Causes of Death (ICD-1)adopted by the International Statistical Institute in 1900 in Paris.1,2The classification was subsequently updated every decade. The updatetask was passed to the World Health Organization (WHO) in 1946,and the classification was renamed International Classification ofDiseases, Injuries, and Causes of Death to serve as the foundation forworldwide health trends and statistics. The update interval haslengthened considerably after ICD-9. ICD-10 was adopted in 1992, 17years after ICD-9. The WHO started working on ICD-11 in 2007 withinvolvement of experts from over 90 countries. ICD-11 was adopted inMay 2019 (27 years after ICD-10) by the World Health Assembly, tobe effective for use from January 2022.3–9 Over 2 dozen countries havePublished by Oxford University Press on behalf of the American Medical Informatics Association 2020.This work is written by US Government employees and is in the public domain in the US.738Downloaded from /5/738/5828208 by National Institutes of Health Library user on 28 July 2020Research and Applications

Journal of the American Medical Informatics Association, 2020, Vol. 27, No. 5739Table 1. Comparison of the Foundation Component and uilding blockIdentifierDefining attributesEntityURIDescription, body site,body system, causalmechanisms, synonyms, exclusions, signsand symptoms etc.HierarchyMulti-parentingDiaphragmatic ption: A hernia occurs through theforamen in the diaphragmSynonyms: paraesophageal hernia, hiatus hernia, esophageal hiatus hernia,sliding hiatus herniaExclusions: congenital diaphragmatichernia, congenital hiatus herniaBody site: diaphragmatic structure (bodystructure), entire diaphragm (bodystructure)Parents: Non-abdominal wall hernia,Other diseases of the digestive systemResidual elementsNonePresentUpdate frequencyContinuousPeriodic with officialversioningExampleCategoryCodeDescription, inclusions, exclusionsDiaphragmatic herniaDD50.0Description: A hernia occursthrough the foramen in thediaphragmInclusions: paraesophagealherniaExclusions: Congenital diaphragmatic hernia (LB00.0), Congenital hiatus hernia (LB13.1)Single-parentingParent: DD50 Non-abdominalwall herniaDD50.Y Other specified non-abdominal wall hernia, DD50.Znon-abdominal wall hernia,unspecifiedAbbreviation: URI, universal resource identifier.developed national extensions of ICD to suit their requirements. In theUS, the Clinical Modification (CM) has been developed since ICD-9CM to support morbidity coding for reimbursement and other purposes.2,10 ICD-10-CM replaced ICD-9-CM in October 2015.BACKGROUNDWith every new ICD version, code syntax usually changes—presumably to avoid confusion with older versions. For example, the codefor Huntington disease is G10 in ICD-10 and 8A01.10 in ICD-11.There is often expansion of the number of codes and some reorganization of the chapters. Apart from these usual changes, ICD-11 has3 brand new features:11,121. Foundation Component. ICD-11 is built on an underlyingknowledge base that holds all necessary information to generatethe tabular list and alphabetical index for mortality and morbidity coding.13 These derivatives are called “linearizations.” It isalso possible to generate alternative lists for different purposes(eg, specialty subsets, country specific modifications).The Foundation Component is a multidimensional collectionof medical entities—diseases, disorders, injuries, external causes,signs, and symptoms. The entities are defined with attributessuch as body site, body system, and causal mechanism. (Table 1)These entities are organized into hierarchies and multiparentingis allowed. When linearizations are derived from the FoundationComponent, only single-parenting is allowed—an essential requirement in a statistical classification to avoid double counting.Categories in a linearization are derived from entities in theFoundation Component and are assigned ICD-11 codes. Not allentities acquire unique codes, as some entities may be mergedinto 1 category. Residual categories (eg, ‘unspecified,’ ‘not elsewhere classified’) are added to ensure that the categories are mutually exclusive and jointly exhaustive—another essentialrequirement of a statistical classification. The Foundation Com-ponent is updated in real time and linearizations are generated atfixed intervals (eg, yearly) and officially versioned.2. Postcoordination. ICD-11 allows the combination of codes(called “cluster coding”) to add additional detail to an existingcode (called “stem code” or “precoordinated code”). Two kindsof postcoordination are allowed:a. Two or more stem codes (syntax: stemcode1/stemcode2/stemcode3, etc) for example, urinary tract infection due toextended spectrum beta-lactamase producing Escherichia coli¼ GC08.0/MG50.27 (GC08.0 Urinary tract infection, sitenot specified, due to Escherichia coli; MG50.27 Extendedspectrum beta-lactamase producing Escherichia coli)b. Stem code(s) with 1 or more extension codes (syntax: stemcode1&extensioncode1&extensioncode2 etc.) for example,tuberculosis of prostate ¼ 1B12.5&XA63E5 (1B12.5 Tuberculosis of the genitourinary system; XA63E5 Prostategland).3. Digital-friendly. Fully embracing the digital age, ICD-11 is accompanied by a host of online and digital resources. Onlineresources include browsers of the Foundation Component andvarious linearizations and a coding tool for the Mortality andMorbidity Statistics linearization (MMS).14–16 Downloadableresources include maps between ICD-10 and ICD-11 and theMMS. Application programming interfaces (API) allow programmatic access to the Foundation Component, MMS, andICD-10. There is also an online maintenance platform for collaborators in the update process.We present a comparative analysis of ICD-11 in relation to ICD10 and ICD-10-CM. Updating ICD to a new version is a nontrivialendeavor which incurs significant cost and has potential impact onlongitudinal data comparability, as evidenced by various reportswhen the US moved from ICD-9-CM to ICD-10-CM.17–26 The goalof the ICD-10 comparison is to provide a high-level view of the extent and pattern of changes. The comparison with ICD-10-CM ismotivated by the possibility that the US could move from ICD-10-Downloaded from /5/738/5828208 by National Institutes of Health Library user on 28 July 2020Foundation component

740Journal of the American Medical Informatics Association, 2020, Vol. 27, No. 5Paratyphoid Fever included Paratyphoid fever A and Paratyphoidfever B. For manual recoding, we picked a convenient sample of 6disease areas in ICD-10-CM that covered common conditions (diabetes, hypertension, pregnancy) and pathologies (infection, trauma,malignancy) and recoded them in ICD-11. For each ICD-10-CMcode, we determined whether its meaning could be fully representedin ICD-11 with or without postcoordination. The recoding was doneby 1 of the authors (JX, physician with extensive ICD knowledge).RESULTSICD-10 comparisonMATERIALS AND METHODSChapter structure, chapter drift and extent of changeICD-11 had 28 chapters, 6 more than ICD-10. The last 3 chapterswere outside the scope of ICD-10 and excluded from further analysis:Data sources We downloaded the following from the WHO ICD-11 website (Version 04/2019): 1. Simple Tabulation – ICD-11 codes, titles, and indexing terms inMMS2. MMS Linearization Tabulation – similar to Simple Tabulation,with additional information (eg, kind of code [chapter, block, orcategory]) and depth in tree3. One Category ICD-10 to ICD-11 Map – each ICD-10 codemaps to only 1 ICD-11 code4. Multiple Categories ICD-10 to ICD-11 Map – each ICD-10code can map to multiple ICD-11 codes5. One Category ICD-11 to ICD-10 Map – each ICD-11 codemaps to only 1 ICD-10 codeWe used the MMS browser and coding tool to look up individualcodes. We used the API to collect additional information not in thedownloadable files.ICD-10 comparisonWe focused on the first 25 chapters of the ICD-11 MMS linearization that aligned with the scope of ICD-10, using only precoordinated ICD-11 codes. We used the one-category ICD-10 to ICD-11map to identify “chapter drift” (ie, codes moved to a chapter otherthan the main corresponding chapter). To quantify chapter drift, wedefined a “chapter drift index” (CDI) for each ICD-11 chapter asthe percentage of codes coming from ICD-10 chapters other thanthe main corresponding chapter. We identified equivalent codes between ICD-10 and ICD-11 by “round tripping,” using the 2 one-category maps. We postulated that if an ICD-10 code mapped to asingle ICD-11 code in the forward map, which mapped back to thesame ICD-10 code in the backward map, then the 2 codes werelikely equivalent. We manually reviewed some round trip maps. Among the first 25 chapters, 3 were new: Chapter 4 Diseases of the immune systemChapter 7 Sleep-wake disordersChapter 17 Conditions related to sexual healthThe other 22 chapters largely mirrored the chapters of ICD-10.However, some conditions could be moved to a chapter other thanthe main corresponding chapter (chapter drift). Figure 1 shows thedegree of correspondence of codes by chapter. The rows are ICD-10chapters and the columns are ICD-11 chapters. The number in eachcell is the number of ICD-10 leaf codes in the one-category ICD-10to ICD-11 map. Only leaf codes, which are the lowest level codeswith no children, are allowed in coding. The largest numbers arefound along the diagonal, meaning that the majority of codes remainin their main corresponding chapters. Three notable breaks in the diagonal pattern correspond to the new chapters 4, 7, and 17 (redarrows). Not surprisingly, many codes from the ICD-10 Chapter IIIDiseases of the blood and blood-forming organs and certain disorders involving the immune mechanism end up in the new Chapter 4Diseases of the immune system. The ICD-10 Chapter V Mental andbehavioral disorders is the biggest contributor of codes to the newChapter 7 Sleep-wake disorders and Chapter 17 Conditions relatedto sexual health.We identified 7 ICD-11 chapters with CDI over 5% (Figure 1,last row). Among these were, not surprisingly, the 3 new chapters,since they did not correspond neatly to a single ICD-10 chapter(thus the need for a new chapter). The other 4 chapters were: ICD-10-CM comparisonSince no maps existed, we used 2 approaches to compare ICD-11 toICD-10-CM, lexical matching and manual recoding. For lexicalmatching, we used the lexical tool LuiNorm from the Unified Medical Language System (UMLS) (2019 version) to normalize ICD-11code names from chapters 1 to 25.29,30 We matched the normalizednames to UMLS concepts (version 2019AA) using the normalizedEnglish strings index (MRXNS ENG).31 Through the UMLS concepts, we matched to ICD-10-CM codes (2019 version). We ignoredICD-11 index terms and ICD-10-CM inclusion terms because theycould be narrower in meaning. For example, the index terms forChapter 26 Supplementary Chapter Traditional Medicine ConditionsChapter V Supplementary section for functioning assessmentChapter X Extension Codes (for support of postcoordination) Chapter 1 Certain infectious or parasitic diseases: some diseasesused to be classified based on body location were now groupedunder

1. Simple Tabulation – ICD-11 codes, titles, and indexing terms in MMS 2. MMS Linearization Tabulation – similar to Simple Tabulation, with additional information (eg, kind of code [chapter, block, or category]) and depth in tree 3. One Category ICD-10 to ICD-11 Map – each ICD-10 code maps to only 1 ICD-11 code 4. Multiple Categories ICD .