Procedures for PerformingSystematic ReviewsBarbara Kitchenhame-mail: [email protected] Technical ReportSoftware Engineering GroupDepartment of Computer ScienceKeele UniversityKeele, StaffsST5 5BG, UKKeele University Technical Report TR/SE-0401ISSN:1353-7776andEmpirical Software EngineeringNational ICT Australia Ltd.Bay 15 Locomotive WorkshopAustralian Technology ParkGarden Street, EversleighNSW 1430, AustraliaNICTA Technical Report 0400011T.1July, 2004 Kitchenham, 2004

0.Document Control Section0.1Contents0.Document Control Section.i0.1Contents .i0.2Document Version Control . iii0.3Executive Summary .iv1. Introduction.12. Systematic Reviews .12.1Reasons for Performing Systematic Reviews .12.2The Importance of Systematic Reviews .22.3Advantages and disadvantages .22.4Feature of Systematic Reviews.23. The Review Process.34. Planning .34.1The need for a systematic review.34.2Development of a Review Protocol .44.2.1The Research Question . Question Types . Question Structure . Population . Intervention . Outcomes . Experimental designs .74.2.2Protocol Review.75. Conducting the review .75.1Identification of Research .75.1.1Generating a search strategy .75.1.2Publication Bias .85.1.3Bibliography Management and Document Retrieval .95.1.4Documenting the Search .95.2Study Selection .95.2.1Study selection criteria.95.2.2Study selection process .105.2.3Reliability of inclusion decisions.105.3Study Quality Assessment .105.3.1Quality Thresholds.115.3.2Development of Quality Instruments.155.3.3Using the Quality Instrument.165.3.4Limitations of Quality Assessment.165.4Data Extraction .175.4.1Design of Data Extraction Forms .175.4.2Contents of Data Collection Forms.175.4.3Data extraction procedures .175.4.4Multiple publications of the same data .185.4.5Unpublished data, missing data and data requiring manipulation .185.5Data Synthesis.18i

5.5.1Descriptive synthesis .195.5.2Quantitative Synthesis .195.5.3Presentation of Quantitative Results.205.5.4Sensitivity analysis.215.5.5Publication bias.216. Reporting the review.226.1Structure for systematic review .226.2Peer Review .227. Final remarks .258. References.25Appendix 1 Steps in a systematic review .27ii

0.2Document Version 11.0DateChanges from previous version1 April 200429 June 2004NoneCorrection of typosAdditional discussion ofproblems of assessing evidenceSection 7 “Final Remarks”added.iii

0.3Executive SummaryThe objective of this report is to propose a guideline for systematic reviewsappropriate for software engineering researchers, including PhD students. Asystematic review is a means of evaluating and interpreting all available researchrelevant to a particular research question, topic area, or phenomenon of interest.Systematic reviews aim to present a fair evaluation of a research topic by using atrustworthy, rigorous, and auditable methodology.The guideline presented in this report was derived from three existing guidelines usedby medical researchers. The guideline has been adapted to reflect the specificproblems of software engineering research.The guideline covers three phases of a systematic review: planning the review,conducting the review and reporting the review. It is at a relatively high level. It doesnot consider the impact of question type on the review procedures, nor does it specifyin detail mechanisms needed to undertake meta-analysis.iv

1.IntroductionThis document presents a general guideline for undertaking systematic reviews. Thegoal of this document is to introduce the concept of rigorous reviews of currentempirical evidence to the software engineering community. It is aimed at softwareengineering researchers including PhD students. It does not cover details of metaanalysis (a statistical procedure for synthesising quantitative results from differentstudies), nor does it discuss the implications that different types of systematic reviewquestions have on systematic review procedures.The document is based on a review of three existing guidelines for systematicreviews:1.The Cochrane Reviewer’s Handbook [4].2.Guidelines prepared by the Australian National Health and Medical ResearchCouncil [1] and [2].3.CRD Guidelines for those carrying out or commissioning reviews [12].In particular the structure of this document owes much to the CRD Guidelines.All these guidelines are intended to aid medical researchers. This document attemptsto adapt the medical guidelines to the needs of software engineering researchers. Itdiscusses a number of issues where software engineering research differs frommedical research. In particular, software engineering research has relatively littleempirical research compared with the large quantities of research available onmedical issues, and research methods used by software engineers are not as rigorousas those used by medical researchers.The structure of the report is as follows:1.Section 2 provides an introduction to systematic reviews as a significantresearch method.2.Section 3 specifies the stages in a systematic review.3.Section 4 discusses the planning stages of a systematic review4.Section 5 discusses the stages involved in conducting a systematic review5.Section 6 discusses reporting a systematic review.2.Systematic ReviewsA systematic literature review is a means of identifying, evaluating and interpretingall available research relevant to a particular research question, or topic area, orphenomenon of interest. Individual studies contributing to a systematic review arecalled primary studies; a systematic review is a form a secondary study.2.1Reasons for Performing Systematic ReviewsThere are many reasons for undertaking a systematic review. The most commonreasons are: To summarise the existing evidence concerning a treatment or technology e.g. tosummarise the empirical evidence of the benefits and limitations of a specificagile method.1

To identify any gaps in current research in order to suggest areas for furtherinvestigation.To provide a framework/background in order to appropriately position newresearch activities.However, systematic reviews can also be undertaken to examine the extent to whichempirical evidence supports/contradicts theoretical hypotheses, or even to assist thegeneration of new hypotheses (see for example [10]).2.2The Importance of Systematic ReviewsMost research starts with a literature review of some sort. However, unless a literaturereview is thorough and fair, it is of little scientific value. This is the main rationale forundertaking systematic reviews. A systematic review synthesises existing work inmanner that is fair and seen to be fair. For example, systematic reviews must beundertaken in accordance with a predefined search strategy. The search strategy mustallow the completeness of the search to be assessed. In particular, researchersperforming a systematic review must make every effort to identify and report researchthat does not support their preferred research hypothesis as well as identifying andreporting research that supports it.2.3Advantages and disadvantagesSystematic reviews require considerably more effort than traditional reviews. Theirmajor advantage is that they provide information about the effects of somephenomenon across a wide range of settings and empirical methods. If studies giveconsistent results, systematic reviews provide evidence that the phenomenon is robustand transferable. If the studies give inconsistent results, sources of variation can bestudied.A second advantage, in the case of quantitative studies, is that it is possible tocombine data using meta-analytic techniques. This increases the likelihood ofdetecting real effects that individual smaller studies are unable to detect. However,increased power can also be a disadvantage, since it is possible to detect small biasesas well as true effects.2.4Feature of Systematic ReviewsSome of the features that differentiate a systematic review from a conventionalliterature review are: Systematic reviews start by defining a review protocol that specifies the researchquestion being addressed and the methods that will be used to perform the review. Systematic reviews are based on a defined search strategy that aims to detect asmuch of the relevant literature as possible. Systematic reviews document their search strategy so that readers can access itsrigour and completeness. Systematic reviews require explicit inclusion and exclusion criteria to assess eachpotential primary study. Systematic reviews specify the information to be obtained from each primarystudy including quality criteria by which to evaluate each primary study. A systematic review is a prerequisite for quantitative meta-analysis2

3.The Review ProcessA systematic review involves several discrete activities. Existing guidelines forsystematic reviews have different suggestions about the number and order of activities(see Appendix 1). This documents summarises the stages in a systematic review intothree main phases: Planning the Review, Conducting the Review, Reporting theReview.The stages associated with planning the review are:1. Identification of the need for a review2. Development of a review protocol.The stages associated with conducting the review are:1. Identification of research2. Selection of primary studies3. Study quality assessment4. Data extraction & monitoring5. Data synthesis.Reporting the review is a single stage phase.Each phase is discussed in detail in the following sections. Other activities identifiedin the guidelines discussed in Appendix 1 are outside the scope of this document.The stages listed above may appear to be sequential, but it is important to recognisethat many of the stages involve iteration. In particular, many activities are initiatedduring the protocol development stage, and refined when the review proper takesplace. For example: The selection of primary studies is governed by inclusion and exclusion criteria.These criteria are initially specified when the protocol is defined but may berefined after quality criteria are defined. Data extraction forms initially prepared during construction of the protocol willbe amended when quality criteria are agreed. Data synthesis methods defined in the protocol may be amended once data hasbeen collected.The systematic reviews road map prepared by the Systematic Reviews Group atBerkley demonstrates the iterative nature of the systematic review process veryclearly [15].4.Planning4.1The need for a systematic reviewThe need for a systematic review arises from the requirement of researchers tosummarise all existing information about some phenomenon in a thorough andunbiased manner. This may be in order to draw more general conclusion about somephenomenon than is possible from individual studies, or as a prelude to furtherresearch activities.3

Prior to undertaking a systematic review, researchers should ensure that a systematicreview is necessary. In particular, researchers should identify and review any existingsystematic reviews of the phenomenon of interest against appropriate evaluationcriteria. CRC [12] suggests the following checklist: What are the review’s objectives? What sources were searched to identify primary studies? Were there anyrestrictions? What were the inclusion/exclusion criteria and how were they applied? What criteria were used to assess the quality of primary studies and how werethey applied? How were the data extracted from the primary studies? How were the data synthesised? How were differences between studiesinvestigated? How were the data combined? Was it reasonable to combine thestudies? Do the conclusions flow from the evidence?From a more general viewpoint, Greenlaugh [9] suggests the following questions: Can you find an important clinical question, which the review addressed?(Clearly, in software engineering, this should be adapted to refer to an importantsoftware engineering question.) Was a thorough search done of the appropriate databases and were otherpotentially important sources explored? Was methodological quality assessed and the trials weighted accordingly? How sensitive are the results to the way that the review has been done? Have numerical results been interpreted with common sense and due regard to thebroader aspects of the problem?4.2Development of a Review ProtocolA review protocol specifies the methods that will be used to undertake a specificsystematic review. A pre-defined protocol is necessary to reduce the possibilityresearcher bias. For example, without a protocol, it is possible that the selection ofindividual studies or the analysis may be driven by researcher expectations. Inmedicine, review protocols are usually submitted to peer review.The components of a protocol include all the elements of the review plus someadditional planning information: Background. The rationale for the survey.The research questions that the review is intended answer.The strategy that will be used to search for primary studies including search termsand resources to be searched, resources include databases, specific journals, andconference proceedings. An initial scoping study can help determine anappropriate strategy.Study selection criteria and procedures. Study selection criteria determine criteriafor including in, or excluding a study from, the systematic review. It is usuallyhelpful to pilot the selection criteria on a subset of primary studies. The protocolshould describe how the criteria will be applied e.g. how many assessors willevaluate each prospective primary study, and how disagreements among assessorswill be resolved.4

Study quality assessment checklists and procedures. The researchers shoulddevelop quality checklists to assess the individual studies. The purpose of thequality assessment will guide the development of checklists.Data extraction strategy. This should define how the information required fromeach primary study would be obtained. If the data require manipulation orassumptions and inferences to be made, the protocol should specify anappropriate validation process.Synthesis of the extracted data. This should define the synthesis strategy. Thisshould clarify whether or not a formal meta-analysis is intended and if so whattechniques will be used.Project timetable. This should define the review plan.4.2.1 The Research Question4.2.1.1 Question TypesThe most important activity during protocol is to formulate the research question. TheAustralian NHMR Guidelines [1] identify six types of health care questions that canbe addressed by systematic reviews:1. Assessing the effect of intervention.2. Assessing the frequency or rate of a condition or disease.3. Determining the performance of a diagnostic test.4. Identifying aetiology and risk factors.5. Identifying whether a condition can be predicted.6. Assessing the economic value of an intervention or procedure.In software engineering, it is not clear what the equivalent of a diagnostic test wouldbe, but the other questions can be adapted to software engineering issues as follows: Assessing the effect of a software engineering technology. Assessing the frequency or rate of a project development factor such as theadoption of a technology, or the frequency or rate of project success or failure. Identifying cost and risk factors associated with a technology. Identifying the impact of technologies on reliability, performance and costmodels. Cost benefit analysis of software technologies.Medical guidelines often provide different guidelines and procedures for differenttypes of question. This document does not go to this level of detail.The critical issue in any systematic review is to ask the right question. In this context,the right question is usually one that: Is meaningful and important to practitioners as well as researchers. For example,researchers might be interested in whether a specific analysis technique leads to asignificantly more accurate estimate of remaining defects after design inspections.However, a practitioner might want to know whether adopting a specific analysistechnique to predict remaining defects is more effective than expert opinion atidentifying design documents that require re-inspection. Will lead either to changes in current software engineering practice or toincreased confidence in the value of current practice. For example, researchers5

and practitioners would like to know under what conditions a project can safelyadopt agile technologies and under what conditions it should not.Identify discrepancies between commonly held beliefs and reality.Nonetheless, there are systematic reviews that ask questions that are primarily ofinterest to researchers. Such reviews ask questions that identify and/or scope futureresearch activities. For example, a systematic review in a PhD thesis should identifythe existing basis for the research student’s work and make it clear where theproposed research fits into the current body of knowledge. Question StructureMedical guidelines recommend considering a question from three viewpoints: The population, i.e. the people affected by the intervention. The interventions usually a comparison between two or more alternativetreatments. The outcomes, i.e. the clinical and economic factors that will be used to comparethe interventions.In addition, study designs appropriate to answering the review questions may beidentified. software engineering experiments, the populations might be any of the following: A specific software engineering role e.g. testers, managers. A type of software engineer, e.g. a novice or experienced engineer. An application area e.g. IT systems, command and control systems.A question may refer to very specific population groups e.g. novice testers, orexperienced software architects working on IT systems. In medicine the populationsare defined in order to reduce the number of prospective primary studies. In softwareengineering far less primary studies are undertaken, thus, we may need to avoid anyrestriction on the population until we come to consider the practical implications ofthe systematic review. will be software technologies that address specific issues, for example,technologies to perform specific tasks such as requirements specification, systemtesting, or software cost estimation. should relate to factors of importance to practitioners such as improvedreliability, reduced production costs, and reduced time to market. All relevantoutcomes should be specified. For example, in some cases we require interventionsthat improve some aspect of software production without affecting another e.g.improved reliability with no increase in cost.A particular problem for software engineering experiments is the use of surrogatemeasures for example, defects found during system testing as a surrogate for quality,6

or coupling measures for design quality. Studies that use surrogate measures may bemisleading and conclusions based on such studies may be less robust. designsIn medical studies, researches may be able to restrict systematic reviews to primary ofstudies of one particular type. For example, Cochrane reviews are usually restricted torandomised controlled trials (RCTs). In other circumstances, the nature of thequestion and the central issue being addressed may suggest that certain studies designare more appropriate than others. However, this approach can only be taken in adiscipline where the amount of available research is a major problem. In softwareengineering, the paucity of primary studies is more likely to be the problem forsystematic reviews and we are more likely to need protocols for aggregatinginformation from studies of widely different types. A starting point for suchaggregation is the ranking of primary studies of different types; this is discussed inSection Protocol ReviewThe protocol is a critical element of any systematic review. Researchers must agree aprocedure for reviewing the protocol. If appropriate funding is available, a group ofindependent experts should be asked to review the protocol. The same experts canlater be asked to review the final report.PhD students should present their protocol to their supervisors for review andcriticism.5.Conducting the reviewOnce the protocol has been agreed, the review proper can start. This involves:1. Identification of research2. Selection of studies3. Study quality assessment4. Data extraction and monitoring progress5. Data synthesisEach of these stages will be discussed in this section. Although some stages mustproceed sequentially, some stages can be undertaken simultaneously.5.1Identification of ResearchThe aim of a systematic review is to find as many primary studies relating to theresearch question as possible using an unbiased search strategy. For example, it isnecessary to avoid language bias. The rigour of the search process is one factor thatdistinguishes systematic reviews from traditional reviews.5.1.1 Generating a search strategyIt is necessary to determine and follow a search strategy. This should be developed inconsultation with librarians. Search strategies are usually iterative and benefit from: Preliminary searches aimed at both identifying existing systematic reviews andassessing the volume of potentially relevant studies.7

Trial searchers using various combinations of search terms derived from theresearch questionReviews of research resultsConsultations with experts in the fieldA general approach is to break down the question into individual facets i.e.population, intervention, outcomes, study designs. Then draw up a list of synonyms,abbreviations, and alternative spellings. Other terms can be obtained by consideringsubject headings used in journals and data bases. Sophisticated search strings can thenbe constructed using Boolean AND’s and OR’s.Initial searches for primary studies can be undertaken initially using electronicdatabases but this is not sufficient. Other sources of evidence must also be searched(sometimes manually) including: Reference lists from relevant primary studies and review articles Journals (including company journals such as the IBM Journal of Research andDevelopment), grey literature (i.e. technical reports, work in progress) andconference proceedings Research registers The Internet.It is also important to identify specific researchers to approach directly for advice onappropriate source material.Medical researchers have developed pre-packaged research strategies. SoftwareEngineering Researchers need to develop and publish such strategies includingidentification of relevant electronic databases.5.1.2 Publication BiasPublication bias refers to the problem that positive results are more likely to bepublished than negative results. The concept of positive or negative results sometimesdepends on the viewpoint of the researcher. (For example, evidence that fullmastectomies were not always required for breast cancer was actually an extremelypositive result for breast cancer sufferers). However, publication bias remains aproblem particularly for formal experiments, where failure to reject the nullhypothesis is considered less interesting than an experiment that is able to reject thenull hypothesis.Publication bias can lead to systematic bias in systematic reviews unless specialefforts are made to address this problem. Many of the standard search strategiesidentified above are used to address this issue including: Scanning the grey literature Scanning conference proceedings Contacting experts and researches working in the area and asking them if theyknow of any unpublished results.In addition, statistical analysis techniques can be used to identify the potentialsignificance of publication bias (se Section 5.5.5).8

5.1.3 Bibliography Management and Document RetrievalBibliographic packages such as Reference Manager or Endnote are very useful tomanage the large number of reference that can be obtained from a thorough literatureresearch.Once reference lists have been finalised the full articles of potentially useful studieswill need to be obtained. A logging system is needed to make sure all relevant studiesare obtained.5.1.4 Documenting the SearchThe process of performing a systematic review must be transparent and replicable: The review must be documented in sufficient detail for readers to be able toassess the thoroughness of the search. The search should be documented as it occurs and changes noted and justified. The unfiltered search results should be saved and retained for possible reanalysis.Procedures for documenting the search process are given in Table 1.Table 1 Search process documentationData SourceElectronic databaseJournal Hand SearchesConference proceedingsEfforts to identifyunpublished studiesOther sources5.2DocumentationName of databaseSearch strategy for each databaseDate of searchYears covered by searchName of journalYears searchedAny issues not searchedTitle of proceedingsName of conference (if different)Title translation (if necessary)Journal name (if published as part of a journal)Research groups and researchers contacted (Names and contact details)Research web sites searched (Date and URL)Date Searched/ContactedURLAny specific conditions pertaining to the searchStudy SelectionOnce the potentially relevant primary studies have been obtained, they need to beassessed for their actual relevance.5.2.1 Study selection criteriaStudy selection criteria are intended to identify those primary studies that providedirect evidence about the research question. In order to reduce the likelihood of bias,selection criteria should be decided during the protocol definition.Inclusion and exclusion criteria should be based

Systematic reviews require considerably more effort than traditional reviews. Their major advantage is that they provide information about the effects of some phenomenon across a wide range of settings and empirical methods. If studies give consistent results, systematic reviews provide evidence that the phenomenon is robust and transferable.