Inter-CoderAgreement AnalysisATLAS.ti 8 Windows
ATLAS.ti 8 Windows - Inter-Coder Agreement AnalysisCopyright 2020 by ATLAS.ti Scientific Software Development GmbH, Berlin. All rights reserved.Manual Version: 669.20200507. Updated for program version: 8.4Author: Dr. Susanne Friese (QuaRC)Programming/Design/Layout: Dr. Thomas G. Ringmayr - hypertexxt.comCopying or duplicating this manual or any part thereof is a violation of applicable law. No part of this manual may be reproduced ortransmitted in any form or by any means, electronic or mechanical, including, but not limited to, photocopying, without written permissionfrom ATLAS.ti GmbH.Trademarks: ATLAS.ti is a registered trademark of ATLAS.ti Scientific Software Development GmbH. Adobe Acrobat is a trademark ofAdobe Systems Incorporated; Microsoft, Windows, Excel, and other Microsoft products referenced herein are either trademarks of MicrosoftCorporation in the United States and/or in other countries. Google Earth is a trademark of Google, Inc. All other product names and anyregistered and unregistered trademarks mentioned in this document are used for identification purposes only and remain the exclusiveproperty of their respective owners.
CONTENTS3ContentsInter-coder Agreement (ICA). 4Why It Matters. 4At What Phase in Your Project Should ICA Analysis Be Performed? . 4Reliability and Validity. 5Requirements for Coding. 5Semantic Domains. 5Rules for Applying Codes. 6Instructions for Coders (Summary Form). 7How Agreement/Disagreement is Measured. 7Methods for Testing ICA. 8Common Mistakes. 12Sample Size. 12Acceptable Levels of Reliability. 13How to Set Up a Project for ICA Analysis. 13Project Management for the Principal Investigator (PI). 14Project Management for Coders. 18Merging Projects for ICA Analysis. 20Running the ICA Analysis. 21Calculating an ICA Coefficient . 23Exporting Results. 24References. 25ATLAS.ti 8 Windows – Inter-Coder Agreement Analysis
4INTER-CODER AGREEMENT (ICA)Inter-coder Agreement (ICA)Why It MattersThe purpose of collecting and analyzing data is that researchers find answers to the research questions that motivated thestudy in the first place. Thus, the data are the trusted ground for any reasoning and discussion of the results. Therefore, theresearchers should be confident that their data has been generated taking precaution against distortions and biases,intentional or accidental, and that the mean the same thing to anyone who uses them. Reliability grounds this confidenceempirically (Krippendorff, 2004).Richards (2009) wrote: "But being reliable (to use the adjective) beats being unreliable. If a category is used in different ways,you will be unable to rely on it to bring you all the relevant data. Hence, you may wish to ensure that you yourself are reliablyinterpreting a code the same way across time, or that you can rely on your colleagues to use it in the same way" (p. 108).There are two ways to rationalize reliability, one routed in measurement theory, which is less relevant for the type of datathat ATLAS.ti users have. The second one is an interpretivist conception of reliability. When collecting any type of interviewdata or observations, the phenomena of interest usually disappears right after it has been recorded or observed. Therefore,the analyst's ability to examine the phenomena relies heavily on a consensual reading and use of the data that represent thephenomena of interest. Researchers need to presume that their data can be trusted to mean the same to all of their users.This means "that the reading of textual data as well as of the research results is replicable elsewhere, that researchersdemonstrably agree on what they are talking about. Here, then, reliability is the degree in which members of a designatedcommunity agree on the readings, interpretations, responses to, or uses of given texts or data. [.] Researchers need todemonstrate the trustworthiness of their data by measuring their reliability" (Krippendorff, 2004, p. 212).Testing the reliability of the data is a first step. Only after establishing that the reliability is sufficiently high, it makes senseto proceed with the analysis of the data. If there is considerable doubt what the data mean, it will be difficult to justify thefurther analysis and also the results of this analysis.ATLAS.ti's inter-coder agreement tool lets you assess the agreement of how multiple coders code a given body of data. Indeveloping the tool we worked closely together with Prof. Klaus Krippendorff one of the leading experts in this field, authorof the book Content Analysis: An Introduction of Its Methodology, and the originator of the Krippendorff's alpha coefficientfor measuring inter-coder agreement.The need for such a tool as an integrated element in ATLAS.ti has long been evident and has been frequently requested byusers. By its nature, however, it could not and cannot be a magic "just click a button and hope for the best" solution kind oftool. If you randomly click on any of the choices that ATLAS.ti offers to calculate an inter-coder agreement coefficient,ATLAS.ti will calculate something. Whether the number you receive will be meaningful and useful depends on how you haveset up your project and your coding.This means if you want to test for inter-coder agreement, it requires at least a minimal willingness to delve into some of thebasic theoretical foundations of what inter-coder agreement is, what it does and can do, and also what it cannot do. In thismanual, we provide some of the basics, but this cannot be a replacement for reading the literature and coming to understandthe underlying assumptions and requirements for running an inter-coder agreement analysis.Please keep in mind that the inter-coder agreement tool crosses the qualitative-quantitative divide. Establishing inter-coderagreement has its origin in quantitative content analysis (see for instance Krippendorff, 2019; Schreier, 2012). If you want toapply it and want to adhere to scientific standards, you must follow some rules that are much stricter than those forqualitative coding.If you want to develop a code system as a team, yes, you can start coding independently and then see what you get. But thisapproach can only be an initial brainstorming at best. It cannot be used for testing inter-coder agreement.At What Phase in Your Project Should ICA Analysis Be Performed?A good time to have your coding checked by other coders is when you have built a stable code system and all codes aredefined. This means, this is somewhere in the middle of the coding process. Once a satisfactory ICA coefficient is achieved,the principal investigator has the assurance that his or her codes can be understood and applied by others and can continueto work with the code system.Reliability and ValidityWhereas reliability offers the certainty that research findings can be reproduced and that no or only limited external "noise"has contaminated the data or the results, validity assures that the assertions made by the research reflect the reality itclaims to represent. Validity concerns truth(s).Reliability relates to validity in the following ways:The more unreliable the data, the less likely it is that researchers can draw valid conclusions from the data. In terms ofcoding data, this means that researchers need to identify valid accounts in the data to a degree better than by chance. If theagreement of two or more coders is not better than the agreement by chance, then reliability is low and you cannot infer thata common understanding of the data exists. Thus: Unreliability limits the chance of validity.ATLAS.ti 8 Windows – Inter-Coder Agreement Analysis
5INTER-CODER AGREEMENT (ICA)On the other hand, reliability does not necessarily guarantee validity. Two coders may share the same world view andhave the same prejudices may well agree on what they see, but could objectively be wrong. Also, if two researchers mayhave a unique perspective based on their academic discipline but their reading is not shared by many people outside theirown scholarly community, the reliability might be high but the outcome of the research has little chance of beingsubstantiated by evidence of the reality that is inferred. As Krippendorff (2004) states: “Even perfectly dependablemechanical instruments, such as computers, can be wrong – reliably.” (p. 213).A third aspect is the dilemma between high reliability and validity. Interesting interpretations might not be reproducible,or interesting data may not occur often enough to establish reliability. Highly reliable data might be boring and oversimplifiedin order to establish a high reliability in the first place.Requirements for CodingSources for unreliable data are intra-coder inconsistencies and inter-coder disagreements. To detect these, the codingprocess needs to be replicated. Replicability can be assured when several independently working coders (at least two) agree: on the use of the written coding instruction. by highlighting the same textual segments to which the coding instructions apply. by coding them using the same codes, or by identifying the same semantic domains that describe them and code themusing the same codes for each semantic domain.Semantic DomainsA semantic domain is defined as a set of distinct concepts that share common meanings. You can also think about them as acategory with sub codes. Examples of semantic domains are:EMOTIONSemotions: joyemotions: excitementemotions: surpriseemotions: sadnessemotions: angeremotions: fearSTRATEGIESstrategies: realize they are naturalstrategies: have a Plan Bstrategies: adjust expectationsstrategies: laugh it offstrategies: get helpACTORactor: partneractor: motheractor: fatheractor: childactor: siblingactor: neighbouractor: colleagueEach semantic domain embraces mutually exclusive concepts indicated by a code. If you code a data segment with'emotions: surprise', you cannot code it also 'emotions: fear'. If both are mentioned in close proximity, you need to create twoquotations and code them separately. You can however apply codes from different semantic domains to one quotation. Youfind more information on multi-valued coding in the section ” Rules for Applying Codes“.ATLAS.ti 8 Windows – Inter-Coder Agreement Analysis
6REQUIREMENTS FOR CODINGFigure 1: How to correct a violation of mutual exclusivenessSema nti c Do ma i ns are Co ntext DependentAt times it might be obvious from the domain name what the context is. The above sub code 'supporting each other'belongs to the context 'benefits of friendship' and it is not about work colleagues supporting each other. If the context is stillunclear and could be interpreted in different ways, you need to make it unambigious in the code definition.Sema nti c Do ma i ns need to b e Co nceptua l l y IndependentConceptual independence means: a sub code from one domain is only specific for this domain. It does not occur in any other domain. thus, each code only occurs once in the code systemTherefore, as semantic domains are logically or conceptually independent from each other, it is possible to apply codes fromdifferent semantic domains to the same or overlapping quotations. For instance, In a section of your data, you may findsomething on the benefits of friendship and some emotional aspects that are mentioned. As these codes come fromdifferent semantic domains (BENEFITS and EMOTIONS), they can both be applied.Devel o pi ng a Co de Sys tem wi th Sema nti c Do ma i nsSemantic domains can be developed deductively or inductively. In most studies applying a qualitative data analysis approach,development is likely to be inductive. This means, you develop the codes while you read the data step by step. For example,in an interview study about friendship, you may have coded some data segments 'caring for each other', 'supporting eachother' and 'improve health and longevity'. Then you realize that these codes can be summarized on a higher level as'BENEFITS OF FRIENDSHIP'. Thus, you set up a semantic domain BENEFITS with the sub codes:- benefits: caring for each other- benefits: supporting each other- benefits: improve health and longevityYou continue coding and come across other segments that you code 'learning from each other'. As this also fits the domainBENEFITS, you add it to the domain by naming it:- benefits: learning from each otherAnd so on. This way you can build semantic domains step by step inductively while developing the code system.Once the code system is ready and the code definitions are written in the code comment fields, you can prepare the projectfor inter-coder agreement testing. At this stage, you can no longer expand a semantic domain. See “How to Set Up a Projectfor ICA Analysis“.When developing a code system, the aim is to cover the variability in the data so that no aspect that is relevant for theresearch question is left-out. This is referred to as exhaustiveness. On the domain level this means that all main topics arecovered. On the sub code level, this means that the codes of a semantic domain cover all aspects of the domain and the datacan be sorted in the available codes without forcing them. An easy way out is to include a catch all 'miscellaneous' code foreach domain into which coders can add all data that they think does not fit anywhere else. However, keep in mind that suchcatch all codes will contribute little to answering the research questions.Rules for Applying Codes codes from one domain need to be applied in a mutually exclusive manner. codes from multiple semantic domains can be applied to the same or overlapping data segments. Mutual exclusiveness: You can only apply one of the sub codes of a semantic domain to a quotation or to overlappingquotations. Using the same code colour for all codes of a semantic domain will help you to detect possible errors.If you find that you have coded a quotation with two codes from the same semantic domain, you can fix it by splitting thequotation. This means, you change the length of the original quotation and create one new quotation, so you can apply thetwo codes to two distinct quotations. See Figure 1: How to correct a violation of mutual exclusiveness.ATLAS.ti 8 Windows – Inter-Coder Agreement Analysis
7REQUIREMENTS FOR CODINGIf codes within a semantic domain are not applied in a mutually exclusive manner, the cu-alpha coefficient cannot becalculated.Multi-Valued Coding: This means that you can apply multiple codes of different semantic domains to the same quotation.For instance, a respondent talks about anger in dealing with her mother and mentions that she had to adjust expectations,this can be coded with codes from the three semantic domains EMOTIONS; ACTOR and STRATEGIES.Figure 2: Mulit-valued codingCoding with codes from multiple semantic domains will allow you to see how the various semantic domains are related forother analyses than inter-coder agreement, the example between opinions on recent events, people involved and type ofconsequences. For this you can use the code co-occurrence tools. You find information on the Code Co-Occurrence Tools inthe full manual.Instructions for Coders (Summary Form)An important requirement for inter-coder agreement analysis is the independence of the coders. Thus, the person who hasdeveloped the code system cannot be one of the coders whose coding goes into the ICA analysis. In addition to the principalinvestigator who develops the code system, two or more persons are needed who apply the codes.The coders will receive a project bundle file from the principle investigator (see “How to Set Up a Project for ICA Analysis“). Itis recommended to provide the coders with the following instruction: When importing the project that you receive from the principal investigator, add your name or initials to the project name. After opening the project, double check your user name (see “User Accounts“). Apply all codes of a semantic domain in a mutual exclusive manner. If fitting you can apply codes from multiple semantic domains to the same or overlapping quotations. Do not make any changes to the codes - do not alter the code definition, do not change the code name or the code color. If you do not understand a code definition, or find a code label not fitting, create a memo and name it 'Comments fromcoder name'. Write all of your comments, issues you find and ideas you have into this memo. Do not consult with other coders. It is important that your coding remains unbiased and the data is coded by all codersindependently. Once you are done coding, create a project bundle file and send it back to the principal investigator.How Agreement/Disagreement is MeasuredThe coefficients that are available measure the extent of agreement or disagreement of different coders. Based on thismeasure one can infer reliability, but the coefficients do not measure reliability directly. Therefore: what is measured is intercoder agreement and not inter-coder reliability.ATLAS.ti 8 Windows – Inter-Coder Agreement Analysis
8HOW AGREEMENT/DISAGREEMENT IS MEASUREDAgreement is what we measure; reliability is what we infer from it.Figure 3: Textual continuumSuppose two coders are given a text and code it following the provided instructions. After they are done, they give you thefollowing record of their coding: There is agreement on the first pair of segments in terms of length and location (agreement: 10 units). On the second pair they agree in length but not in location. There is agreement on an intersection of the data (agreement;12 units, disagreement: 2 2 units). In the third pair, coder A finds a segment relevant that is not recognized by coder B.(disagreement: 4 units). The largest disagreement is observed in the last pair of segments. Coder A takes a narrower view than coder B(agreement: 4 units, disagreement: 6 units).In terms of your ATLAS.ti coding, you need to think of your document as a textual continuum. Each character of your text is aunit of analysis for ICA, starting at character 1 and ending for instance at character 17500. For audio and video documents,the unit of analysis is a second. Images can currently not be used in an ICA analysis.The quotation itself does not go into the analysis, but the characters or seconds that have been coded. In other words, if eachcoder creates quotations, it is not a total disagreement if they have not highlighted the exact same segment. The part of thequotations that overlap go into the analysis as agreement; the other parts as disagreement.Another option is to work with pre-definded quotations. So that coders only need to apply the codes. For more informationsee “How to Set Up a Project for ICA Analysis“.You can use the inter-coder agreement tool for text, audio and video data.Methods for Testing ICAATLAS.ti currently offers three methods to test inter-coder agreement: Simple percent agreement Holsti Index Two of the Krippendorff's family of alpha coefficients.All methods can be used for two or more coders. For scientific reporting, we recommend the Krippendorff alphacoefficients.Percent AgreementPercentage Agreement is the simplest measure of inter-coder agreement. It is calculated as the number of times a set ofratings are the same, divided by the total number of units of observation that are rated, multiplied by 100.The benefits of percentage agreement are that it is simple to calculate and it can be used with any type of measurementscale. Let's take a look at the following example: There are ten segments of text and two coders only needed to decidewhether a code applies or does not apply:ATLAS.ti 8 Windows – Inter-Coder Agreement Analysis
9HOW AGREEMENT/DISAGREEMENT IS MEASUREDSegments12345678910Coder 11100000000Coder 20110010100Percent Agreement (PA) number of agreements / total number of segmentsPA 6 / 10PA 0.6 60%Coder 1 and 2 agree 6 out of 10 times, so percent agreement is 60%. One could argue this is quite good. This calculation,however, does not account for chance agreement between ratings. If the two coders were not to read the data and wouldjust randomly code the 10 segments. we would expect them to agree a certain percentage of the time by chance alone. Thequestion is: How much higher is the 60% agreement over the agreement that would occur by chance? Below only the resultsare presented if chance agreement is taken into account. If you are interested in the calculation, take a look at Krippendorff(2004, p. 224-226). The agreement that is expected by mere chance is (9.6 1.6)/20 56%. The 60% agreement thus is notimpressive at all. Statistically speaking, the performance of the two coders is equivalent to having reliably coded only 1 of the10 segments, and have arbitrarily assigned 0s and 1s to the other 9 segments.Ho ls ti IndexHolsti's method (1969) is a variation of percentage agreement, as percent agreement cannot be used if the coders have notall coded the same data segments. When coders were allowed to create their own quotations and did not code pre-definedquotations, the Holsti index needs to be user. P'lease note, that also the Holsti index does not take into account chanceagreement.The formula for the Holsti Index is:PA (Holsti) 2A/ (N1 N2)PA (Holsti) represents percentage of agreement between two coders,A is the number of the two coders' consensus decisions, and N1 and N2 are numbers of decisions the coders have maderespectively .Percentage agreement and the Holsti Index are equal when all coders code the same units of sample.Co hens Ka ppaATLAS.ti does not offer a calculation for Cohen's Kappe, because of severe limitations. Cohen's kappa is a modification ofScott's pi and according to Krippendorff (2019) and Zwick (1988) a rather unfortunate one because of a conceptual flaw in itscalculation. Unlike the more familiar contingency matrices, which tabulate N pairs of values and maintain reference to thetwo coders, coincidence matrices tabulate then pairable values used in coding, regardless of who contributed them, in effecttreating coders as interchangeable. Cohen's kappa, by contrast, defines expected agreement in terms of contingencies, asthe agreement that would be expected if coders were statistically independent of each other. Cohen's conception of chancefails to include disagreements between coders’ individual predilections for particular categories, punishes coders who agreeon their use of categories, and rewards those who do not agree with higher kappa-values. This is the cause of other notedoddities of kappa.The statistical independence of coders is only marginally related to the statistical independence of theunits coded and the values assigned to them. Cohen's kappa, by ignoring crucial disagreements, can become deceptivelylarge when the reliability of coding data is to be assessed.In addition, both Cohen's kappa and Scott's P assume an infinite sample size. Krippendorff's alpha coefficient in comparisonis sensitive to different sample sizes and can also be used on small samples.ATLAS.ti 8 Windows – Inter-Coder Agreement Analysis
10HOW AGREEMENT/DISAGREEMENT IS MEASUREDKri ppendo rf f ’s Fa mi l y O f Al pha Co eff i ci entsThe family of alpha coefficients offers various measurement that allow you to carry out calculations at different levels:Figure 4: Krippendorff’s alpha family – from the general to the specificC - ALPHA BINARYAt the most general level, you can measure whether different coders identify the same sections in the data to be relevant forthe topics of interest represented by codes.You can, but do not need to use semantic domains at this level. It is also possible to a enter single codes per domain. You geta value for alpha binary for each code or semantic domain in the analysis, and a summary value for all items in the analysis.All text units are taken into account for this analysis, coded and uncoded matter.If you work with pre-defined quotations, the binary coefficient will be 1 for a semantic domain if only codes of the samesemantic domain have been applied, regardless of which codes within the domain.The summary alpha binary is always 1 when working with pre-defined quotations, as all coded segments are the same.CU- ALPHAAnother option is to test whether different coders were able to distinguish between the codes of a semantic domain. Forexample, if you have a semantic domain called 'type of emotions' with the sub codes:- emotions::contentment- emotions::excitement- emotions::embarrassment- emotions::reliefThe coefficient gives you an indication whether the coders were able to reliably distinguish between for instance'contentment' and 'excitement', or between 'embarrassment' and 'relief'. The cu-alpha will give you a value for the overallperformance of the semantic domain. It will however not tell you which of the sub codes might be problematic. You need tolook at the quotations and check where the confusion is.The cu-alpha coefficient can only be calculated if the codes of a semantic domain have been applied in a mutuallyexclusive manner. This means only one of the sub codes per domain is applied to a given quotation. See “Requirementsfor Coding“.ATLAS.ti 8 Windows – Inter-Coder Agreement Analysis
11HOW AGREEMENT/DISAGREEMENT IS MEASUREDCU-ALPHACu-alpha is the summary coefficient for all cu-alphas. It takes into account that you can apply codes from multiple semanticdomains to the same or overlapping quotations. See the information on multivalued coding in “Requirements for Coding“.Thus Cu-alpha is not just the average of all cu-alphas.If codes of a semantic domain A have been applied to data segments that are coded with codes of a semantic domain B, thisdoes not affect the cu-alpha coefficient for either domain A or B, but it effects the overall Cu-alpha coefficient. You caninterpret the Cu-alpha coefficient as indicating the extent to which coders agree on the presence or absence of the semanticdomains in the analysis. Formulated as a question: Could coders reliably identify that data segments belong to a specificsemantic domain, or did the various coders applied codes from various other semantic domains?In the calculation for both the cu- and Cu-alpha coefficient, only coded data segments are included in the analysis.C(S) U- ALPHAThis coefficient also belongs to the family of alpha coefficients, but it is not yet implemented. Once implemented, it willallow you to drill down a level deeper and you can check for each semantic domain which code within the domain performswell or not so well. It indicates the agreement on coding within a semantic domain.For example, if you have a domain 'type of emotions' with the sub codes:- emotions::contentment- emotions::excitement- emotions::embarrassment- emotions::reliefCoders may n
ATLAS.ti 8 Windows - Inter-Coder Agreement Analysis. INTER-CODER AGREEMENT (ICA) 4 Inter-coder Agreement (ICA) Why It Matters The purpose of collecting and analyzing data is that researchers find answers to the research questions that motivated the study in the first place. Thus, the data are the trusted ground for any reasoning and .