Can Patient Record Summarization Support Quality Metric Abstraction?Rimma Pivovarov, PhD1, Yael Judith Coppleson, MPH1, Sharon Lipsky Gorman, MS2,David K. Vawdrey, PhD1,2, Noémie Elhadad, PhD21Value Institute, NewYork-Presbyterian Hospital, New York, NY; 2Department ofBiomedical Informatics, Columbia University, New York, NYAbstractWe present a pre/post intervention study, where HARVEST, a general-purpose patient record summarization tool,was introduced to ten data abstraction specialists. The specialists are responsible for reviewing hundreds of patientcharts each month and reporting disease-specific quality metrics to a variety of online registries and databases. Wequalitatively and quantitatively investigated whether HARVEST improved the process of quality metric abstraction.Study instruments included pre/post questionnaires and log analyses of the specialists’ actions in the electronic healthrecord (EHR). The specialists reported favorable impressions of HARVEST and suggested that it was most usefulwhen abstracting metrics from patients with long hospitalizations and for metrics that were not consistently capturedin a structured manner in the EHR. A statistically significant reduction in time spent per chart before and after use ofHARVEST was observed for 50% of the specialists and 90% of the specialists continue to use HARVEST after thestudy period.IntroductionIn 1999, the Institute of Medicine released To Err is Human,1 which elucidated shortcomings in healthcare andignited the movement to improve quality and patient safety. Many initiatives aimed at improving healthcare qualitywere developed, several of which involved the creation of large-scale data registries that are populated using bothclaims data and manual data abstraction2.Indicators used to assess quality of care are often buried within patient records. To accurately abstract these qualityindicators, specially trained nurses manually comb through patient records to locate relevant information. Our2,600-bed institution employs 35 full-time data abstraction specialists dedicated to reporting quality metrics for 30databases covering 13 disease states and processes of care. Measures include CMS and Joint Commission databasesfor Core Measures, as well as disease-specific registries such as transplant, sepsis and stroke. The goals of thedatabases vary, from national benchmarking, to inclusion in value-based purchasing, to peer-comparisons forfacilitating quality improvement. Participation in some databases is voluntary, such as the Society for ThoracicSurgery data registry, while participation in others is a regulatory requirement, such as the UNOS transplantdatabase.Each data abstracter is responsible for extracting a particular set of data using a combination of structured andunstructured clinical data including laboratory test information, comorbidities and complications through review ofclinical documentation, and changes in clinical status throughout a patient’s hospital stay. The complexity of eachdatabase and ease of access to information is highly variable. Some registries populate data through electronic feedsof structured documentation while others require complete manual abstraction for everything from demographics totime of symptom onset. Although every data abstracter is responsible for reporting information from patients’heterogeneous and voluminous records, the information they seek is very different, and the tools they use are differentas well.While there are many benefits in participating in quality improvement registries and databases, the burden of manualabstraction can be excessive. In fact, it is estimated that for physician practices, physicians and staff spend on average15.1 hours per week and approximately 15.4 billion dollars annually dealing with the reporting of quality measures7.Despite the large time burden, it is likely that the number of reported quality metrics will continue to grow8.There has been some recent success in developing natural language processing (NLP) approaches for quality metricabstraction. However, these approaches are evaluated in a retrospective fashion, outside the workflow of clinical dataabstraction experts, and have largely been disease- or metric-specific. Some tools have been developed mainly forinternal comparison purposes3 and others optimized to report quality metrics to specific databases: Yetisgen et al.developed a system for automated extraction of comorbidities, operation type, and operation indication for surgicalcare reporting4 and Raju et al. developed an automated method for the specific extraction of adenomas mentions5.

Beyond the barriers of moving from retrospective studies to interventions, and deploying NLP-based systems in ahospital setting, there exist barriers specific to the DA work: the metrics, diseases, and types of information neededvary drastically from one database to another. Furthermore, most of the DA work consists of abstracting (rather thanextracting specific facts from the record) based on unstructured guidelines (although there has been work onautomatically structuring quality metrics)6. The disease-specific work has yielded promising results and shown thatsome disease-specific metrics can be extracted in a fully automated fashion. Still, to date there are few availabledisease-specific quality metric abstractor tools as they are difficult to create and optimize, especially as the gatheringof quality metrics is very heterogeneous. We hypothesize that a more holistic and broad approach could provide muchneeded support to a variety of DAs at once.There has been some research demonstrating the potential benefit of holistic clinical summarization tools forphysicians at the point of care9,10, however, there has been much less work examining the information needs and utilityof a holistic summarization tool for the purpose of quality abstraction. Our institution already provides EHR usersaccess to HARVEST11, a real-time, patient-level, summarization and visualization system. HARVEST was originallydeveloped to aggregate relevant patient information for busy clinicians who often do not have the time to read or evenskim through all available previous clinical notes. In this study, we explore the value of HARVEST in a differentscenario—as a support tool for DAs in their abstraction work. HARVEST’s interface consists of three sections: aTimeline, a Problem Cloud, and a Note Panel (Figure 1). The Timeline is interactive and provides a high-level displayof a patient’s inpatient, outpatient, and ED visits. The Problem Cloud displays all of the patient’s documentedproblems (as extracted from parsing of all the patient’s notes) with the most salient problems for this patient duringthe selected time range appearing larger and on top. HARVEST is a general-purpose summarization system: it extractsall problems relevant to a patient and operates on all patients in the institution. As such, it differs from the dedicatedNLP approaches to quality abstraction described above in that it does not aim to do the work of the DAs, but rathersupport them in their search for information in the vast amount of documentation available for a given patient.Figure 1. Screenshot of HARVEST for a de-identified patient record. The Timeline focuses on a single admission,and the user has selected "vomiting" in the Problem Cloud. The Note Panel shows the notes for that visit whichmention the problem. The user has selected specific note (on the right) and all the mentions of that problem arehighlighted.

Our overall research question for this study is whether a patient record summarization tool, such as HARVEST,supports the needs of DAs during their abstraction work. Towards that goal we designed a pre/post intervention studyaimed at DAs with a heterogeneous range of metrics and abstraction workflows. We investigated the DAs’ subjectivesatisfaction of the tool as well as the impact on workflow based on their usage logs of the EHR.MethodsData CollectionAfter obtaining appropriate institutional review approval, we presented HARVEST in November 2015 to DataAbstraction specialists at NYP as part of one of their “Lunch and Learn” meetings. The presentation included a 10minute demonstration of HARVEST and where to access it on the EHR. DAs were recruited to participate in our studyand consented one by one.DA study participation consisted of four parts: (1) subjects were asked to answer a pre-study questionnaire; (2) for 4weeks, the subjects were asked to use HARVEST during their normal course of abstraction, whenever they felt itmight be useful to their abstraction and for any given patient; and (3) after the study period, the subjects were askedto respond to a post-study questionnaire. Finally, for all subjects, (4) their EHR usage logs were collected for threemonths before the beginning of the study and two months after the end of the study period (i.e., three months afterbeginning of study).The pre-study questionnaire consisted of 20 questions and captured basic information about the DAs, including forhow long they had done abstraction work, which database they abstract for, how many data elements they need toabstract for a given patient chart, and a description of their typical workflow when abstracting a chart (i.e., which partof the EHR they visit and in which order). The pre-study questionnaire also asked for expectations of whereHARVEST might be most useful, and where it may fit into the DA’s specific workflow.During the 4-week study period, we sent weekly reminders to the subjects that they could use HARVEST wheneverthey felt it would be useful to them.The post-study questionnaire consisted of 33 questions. The questions were inspired by the Technology AcceptanceModel (TAM)12 and the System Usability Scale (SUS)13. Many of the SUS questions were included in thequestionnaire to measure ease of use and intention to use. Other questions were included to capture the DA’s overallperception of HARVEST, for what purposes they used HARVEST, its perceived usefulness for their abstraction work,whether they had identified any unintended consequences of using HARVEST, and whether the subjects planned tocontinue using HARVEST in their daily work.The questionnaires were distributed and results were collected using Qualtrics. The EHR usage logs were obtained forall systems in the NYP EHR ecosystem (which include HARVEST usage logs). The logs contained information aboutwhich function within the EHR was accessed and at what time. Examples of action types include document review,laboratory test results overview, and visit summary review. Usage log analysis enabled the computation of metricssuch as time spent abstracting a given patient chart and how many patients were reviewed with or without access toHARVEST. Basic workflow information can also be derived from the usage logs as frequent sequences of actions ofthe different users.MetricsWe assessed the impact of HARVEST on DA workflow in three different ways: user perception, workflow changes,and user retention. All data processing and statistical analyses were done using a combination of python and R.User perception about HARVEST was measured through the pre and post-questionnaire questions. We compare howthe DAs predicted they would use HARVEST and how they actually used HARVEST during the study period.For the workflow analysis, we present DA self-reported workflows and measure workflow changes using the EHRlog data. Workflow changes were measured in two different ways: how long DAs spend and how many EHR actiontypes DAs take on individual patient abstractions before and after the introduction of HARVEST. The pre-HARVESTperiod was defined as 3 months before the DA was introduced to HARVEST, and the post-HARVEST period wasdefined as 3 months after the DA was introduced to HARVEST. A t-test with Bonferroni correction was used tomeasure whether there was a change in time spent per patient or a change in number of EHR actions for each DAbefore and after the introduction of HARVEST. Of note, all patients abstracted in the post-HARVEST period (whetherseen with HARVEST or not) were included in the EHR log analyses.

To measure the most common workflows within HARVEST, we ran the SPADE algorithm14 for sequence eventmining (using the cSPADE package in R) for all HARVEST accesses during the study period.Retention was measured using EHR logs to see how many DAs returned to HARVEST after the study was completedand how often they used HARVEST after the study was completed. The log-based retention was compared to how theDA predicted they would use HARVEST in the future.ResultsSubjects and DA abstraction tasks10 DAs out of the 35 DAs enrolled in the study. Subjects were nurses, and most have worked as DAs for less than 5years (minimum experience was 1 year, maximum was 13 years). Each completed both the pre-questionnaire and thepost-questionnaire 4 weeks after. The median time to completion for the pre and post-study questionnaire was 25 and13 minutes, respectively.Together, the enrolled subjects abstract quality metrics for 7 different databases, corresponding to 6 diseases of interest(cardiac metrics are reported to multiple databases). Databases to abstract patient information varied in the number ofdata elements to abstract, as well as the number of forms to fill out per patient: some have 6-10 different forms perpatient while others only 1. In addition, DAs may return to the same patient over time; for example, transplant DAsfill out one form for a patient pre-transplant and another for the same patient at 6 months post-transplant.Table 1 summarizes the self-reported DAs experiences and workflows, as captured by the pre-study questionnaire. Amajority of the DAs abstract over 75 individual data elements for each patient. The data elements are found in at least7 different EHR data types (e.g., physician and nurse notes, laboratory tests, flowsheets, medication orders, etc.). Tofind each of the 75 data elements the DAs routinely visit multiple clinical systems during their workflow, likely aconsequence of the fragmented and legacy systems that house patient data in our large academic medical center. Onecommonality across all DAs is they reported that most of their time is spent reading and abstracting from clinicalnotes.DiseaseDatabase# DAswhoabstractfordiseaseenrolledin study% DAswhoabstractfor diseasewhoenrolled instudyAverage #of dataelementsabstractedper patient# differentEHR datatypesaccessedduringabstraction# differentsystemsaccessedduringabstractionWhere mostof DA’stime is spentBariatricMetabolic andBariatric SurgeryAccreditation andQualityImprovementProgram1100%100 73NotesCardiacSociety of ThoracicSurgeonsNY State DOH329%100 82-4Notes,OperativeDataSepsisNY State DOH257%75-10072-4Notes, VitalSigns125%75-10071Notes150%100 72Notes235%25-50103-4Notes, Labs,MedsSurgeryGet With theGuidelinesAmerican College ofSurgeonsTransplantUnited Network forOrgan SharingStrokeTable 1. Participant-reported data from the pre-study questionnaires on what is abstracted, from where, and what isthe DA’s general workflow.When asked further about their abstraction workflow, 9 out of 10 subjects reported that finding relevant informationfrom clinical documentation was tedious, and only 30% of DAs agreed that patient chart review was efficient. There

was in fact a very wide range of perceived time spent per patient chart review across DAs and in particular acrossdatabases. Stroke DAs reported spending on average between 20 and 45 minutes per patient; bariatric reported 35-50minutes per patient; surgery reported 45 minutes per patient; transplant DAs reported between 20 and 90 minutes perpatient; cardiac DAs reported 20-45 minutes per patient for one database and 30-120 minutes for another; and sepsisDAs reported 120-180 minutes per patient on average.DAs Satisfaction with HARVESTIn the post-study questionnaire, all subjects agreed that HARVEST was accurate, all elements in the HARVESTinterface were necessary and not redundant. All subjects felt confident using HARVEST and did not perceive anynegative impact when using it for patient chart review. In addition, most subjects (80%) found the system easy tolearn, easy to use, and not unnecessarily complex.The majority of subjects thought that HARVEST positively impacted their abstraction process (60%), while the restsaw no impact on their abstraction process (40%). None saw a negative impact. Similarly, 60% of the subjects thoughtthey would continue using HARVEST. When asked when HARVEST was useful and when it was not, subjectsprovided feedback as summarized in Table 2. Subjects found HARVEST most useful when patients had longer chartsand they liked the problem-oriented approach to navigating the record. Subjects who did not find HARVEST helpfulto their abstraction process reported that the data they abstract is typically easy to find in the patient chart already.HARVEST is mostuseful in whenpatients have longercharts“It was helpful in all the cases I abstracted, but definitely helped me save time with longer charts, as wehave many elements to abstract. It helped me narrow down timing of events such as onset of sepsis orcomorbidities/complications during the patient's stay”“Yes! It helped me focus on areas that would have been cumbersome to abstract and review in a largerchart.”“Definitely! and made it easier to visualize so i knew where to focus”“It helped me get through longer charts much more quickly.”No added value toHARVEST when theinformation needed iseasy to access in thechart.“Most of the patients I'm abstracting have very short visits so it's easy to read the chart through. Mostof the data points are related to the current visit so it's easy to get all the information directly from theshort charts. The only data points that Possibly helpful was the history. If this wasn't included in theH&P (which is rare) then I would try to use HARVEST for it.”“I think HARVEST would be most useful for utilization review and/or data abstraction for newerregistries where documentation may not be as complete and structured.”“Although I think HARVEST is clever technology and has great potential to be useful, I will not continueto use it because there is already very specific templates and structured fields in the charts for my dataabstraction.”“Its easier to find the notes with correlating information”“without HARVEST, I would've had to open and skim every clinician note to find those diagnoses”HARVEST’sproblem-orientednavigation of thechart is helpful.“I use HARVEST when there is something in particular I have trouble finding.”“I was able to use it alongside my abstraction tool to help me answer questions more quickly. There areseveral elements from our abstraction that HARVEST helps us speed up the search process within thechart.”“questionable diagnoses of hepatitis and pneumonia that were confirmed by HARVEST because it wasable to gather together all the source documents that contained those diagnoses”Table 2. Common perceptions of HARVEST from subjects and quotes from the post-study questionnaire.

Some DAs found that HARVEST played a positive role in data verification (2 found it to be very helpful, and 2sometimes used it for verification). 6 DAs found that HARVEST was able to help identify things that they wouldhave missed while others thought they would have found the information, but perhaps it would have taken longer.WorkflowWhile the DAs were able to explain their workflow (i.e., series of actions in the EHR to carry out the abstraction fora given patient) in a detailed fashion in the pre-study questionnaire, the workflows were not as easily discernable whenanalyzing the EHR usage logs. Nevertheless, there were some clear patterns identified when comparing EHR usagelogs 3-months pre-study, and 3-months after introduction to HARVEST.Time spent on patient abstraction. When looking at the distribution of time spent per patient before and afterHARVEST, and binning the distributions into quartiles, we found statistically significant reduction in time spent onpatient abstraction for 50% of the subjects: one subject had a reduction in overall time spent over all quartiles resultingin an average 20 minutes time gained per patient, two subjects saw a reduction in time spent on short abstractions(first quartile of the distribution), and two subjects saw a reduction in time spent on long abstractions (last quartile ofthe distribution). For illustrative purposes, Figure 2 displays examples of distributions time before and after theintroduction of HARVEST. Additionally, 1 DA saw a statistically significant increase in time spent post-HARVEST.These findings from the EHR usage logs were confirmed by the subjects’ perceived impact of HARVEST on theirworkflow: 4 of the 5 DAs who had statistically significant shorter times reported also reported that their workflowwas shortened by HARVEST. The DA who had an increase in time spent on abstraction post study did not report aperceived increase.Figure 2. Density plots of the time spent per patient pre- and post-HARVEST introduction. A t-test demonstrated thatthe sepsis abstractor had a significant reduction (p 0.001) in total time spent. On average, the sepsis abstractor saved20 minutes per patient. The bariatric DA had no statistically significant reduction in time spent on chart review.EHR actions. There was a significant decrease in EHR actions related to accessing the list of visits and informationrelated to the visits. In fact, the only EHR action that increased pre- and post-HARVEST introduction was access tothe HARVEST tab in the EHR. Table 3 displays some of the statistically significant differences in EHR actions andtheir usage pre- and post-HARVEST introduction. We hypothesize that the decrease indicates that subjects used theTimeline view of HARVEST, which visualizes all of the patients’ visits from all settings, as an alternative to thetraditional EHR visit list. EHR access to notes was also decreased for 3 DAs. This finding was also confirmed by the

perceived impact of HARVEST as reported in the post-study questionnaire, where access to documentation throughthe problem-oriented view was seen as helpful to the abstraction process.Average Actions per PatientPre HARVESTReview of DocumentationCheck OrdersView PatientAccess PatientDemographicsVisit SummaryPost 3Table 3. All EHR actions with a statistically significant pre- and post-HARVEST difference for at least two DAs.Each of the rows displayed has a Bonferonni corrected p-value 0.001. Documentation review had a significantdecrease for 3 DAs.HARVEST workflow for patient chart abstraction. A deeper dive into the workflows within HARVEST foundthat aside from the basic action of loading HARVEST and setting a time period to view, the most frequent parts ofHARVEST used are the selection of a problem in the Problem Cloud and opening of a note. Across the entire studyperiod and 10 DAs, there were 222 sequences where HARVEST was used. Table 4 shows the 11 possible actions aspart of the HARVEST interface, and the usage frequency of each by the subjects during the 3-month post-HARVESTintroduction period.% of HARVEST workflowsthat incorporate this actionAccess HARVEST tab in EHR for a patient100%Set Timeline to a specific date range100%Select problem in Problem Cloud73%Open Note72%Expand Problem Cloud to include all problems68%Zoom in on Timeline62%Select a particular visit/admission in Timeline62%De-select a problem in Problem Cloud15%De-select a particular visit/admission in Timeline5%Shrink Problem Cloud to most salient problems4%Zoom out to full longitudinal view of patient chart0%Table 4: The usage frequency of each of the 11 actions across all DAs in the study.As complement to these findings from the log analysis, the pre- and post-study questionnaire showed the followingtrends: 80% of the subjects expected the Timeline to be useful pre-study, but only 60% found it useful after the 1month study period. 70% of the subjects reported finding the note access functionality in HARVEST useful, which is

also reflected in the 72% usage of the “Open Note” functionality based on the EHR logs. And similarly, 60% of thesubjects found the Problem Cloud useful for their abstraction workflow in the post-study questionnaire; also reflectedby the 68% usage of the “Expand Problem Cloud” function in HARVEST (Table 4).Figure 3 shows the most common sequences of HARVEST actions, as derived from all subjects in the 3-months postHARVEST introduction. Most commonly DAs just went straight to note without clicking term. Many times theywould select and re-select the timeline, move the timeline (Figure 3). Multiple “Load HARVEST for Patient” withinone sequence occur if the DA accesses another tab and returns to HARVEST or refreshes the HARVEST page.Figure 3: Most frequent workflows in the HARVEST application during one session identified using the SPADEalgorithm. The width of the arrow represents frequency with which this event sequence occurs.HARVEST Usage and Post-Study RetentionOverall, 70% of the study participants used HARVEST for at least 10% of their abstractions. The 2 Sepsis DAs usedHARVEST on over 80% of their patient abstractions.When the DAs were asked if they would continue to use HARVEST, 60% predicted that they would. Most commonlythese DAs found that HARVEST would continue to be a part of their work because it helps filter and search throughnotes in a fast and accurate way, it has the ability to enable search for conditions and events that would be hardespecially during extended admission, and it saves time. For the 40% who predicted they would not use HARVESTin the future, the problem-oriented view of the record presented by HARVEST did not align with the type ofinformation they abstract in their tasks (for instance, one DA said “None of the problem cloud words are ones I woulduse.”), or that the data they needed was already easily accessible in the chart (see quotes in Table 2).Interestingly, even though only 60% of subjects said they would continue using HARVEST, 90% of the studyparticipants continued to use it in the post-study period of two months (see Figure 4). Even for some of the subjectsfor whom there was no significant time gained in patient chart abstraction, they continued using the tool for asignificant portion of their charts (e.g., bariatric DA). The main exception to post-study retention were the cardiacDAs: 2 of the 3 abstractors predicted that they would continue to use HARVEST; however, according to the EHR logstheir usage of the tool dropped after study completion. It is not clear exactly why the cardiac abstractors usedHARVEST less than expected, but it seems that the tool did not become well integrated into their workflow.Alternatively, the sepsis team, which has one of the longest and most complex abstraction workflow (up to 180 minutespre-HARVEST), saw such a change in workflow that they adopted the use of HARVEST as part of their protocol forchart abstraction.

Proportion of Abstractions whereHARVEST is usedHARVEST usage during and after study period100%During Study PeriodAfter Study cSepsisStrokeSurgeryTransplantFigure 4. The retention rates of the DAs use of HARVEST. HARVEST usage stayed fairly steady after the studyperiod for all metric types except cardiac where the use was much reduced, and stroke where the use more thandoubled.DiscussionThe pre-study questionnaire reflected the complex and widely diverse activities involved in the process of chartabstraction across diseases of interest and quality databases.Questionnaires and EHR usage log analyses provided complementary views of the impact of HARVEST onabstraction tasks across diseases. Our findings suggest that a problem-oriented patient summarizer coupled with apatient timeline can support the nurses and their information needs. HARVEST is most useful for DAs who are askedto locate data elements for patients with lengthy hospitalizations (and thus with large amounts of clinicaldocumentation in their charts) as well as to locate data elements that are distributed in many parts of the record.However, the extent to which a summarizer such as HARVEST can be useful is variable and depends on the specificabstraction task, as well as to which extent the patient chart already documents the desired data elements in one place.In general, DAs had a mostly positive response to HARVEST and found it useful for data verification, searchingwithin clinical notes, and identifying the timing of events. We also found that HARVEST was able to providestatistically significant time savings for some groups of DAs. Although not all of the DAs had a statistically significantreduction in the time spent abstracting and some reported that HARVEST did not have specific terms that werenecessary for their specific abstraction, 90% continued to use HARVEST after the study completion of the study.In addition to the 10 staff who participated in the HARVEST study, the use of HARVEST continues to spread acrossthe Division of Quality and Patient Safety at NYP. Sepsis abstractors have recently trained Trauma abstractors on howSepsis uses HARVEST and where it fits best into their workflow. 3 DAs abstracting for trauma, cardiac, and coremeasures who were not enrolled in the study have been found to consistently using HARVEST. Finally, 1 DA whowas recently reassigned to a new database has switched from never using HARVEST to visiting the HARVEST for56% of their patient abstractions.Regarding our initial question on the usefulness of a general-purpose summarizer for abstraction, our results indicatethat HARVEST supports the needs of abstraction for up to 10 out of the 30 quality databases at NYP.LimitationsEven though the DAs workflows are well-defined and are withou

internal comparison purposes3 and others optimized to report quality metrics to specific databases: . abstract for a given patient chart, and a description of their typical workflow when abstracting a chart (i.e., which part of the EHR they visit and in which order). The pre