Visualization, Assessment and Analytics in Data StructuresLearning ModulesMatthew McquaigueDavid BurlinsonKalpathi SubramanianThe University of North CarolinaCharlotte, North [email protected] University of North CarolinaCharlotte, North [email protected] University of North CarolinaCharlotte, North [email protected] SauleJamie PaytonThe University of North CarolinaCharlotte, North [email protected] UniversityPhiladelphia, [email protected] recent years, interactive textbooks have gained prominence in aneffort to overcome student reluctance to routinely read textbooks,complete assigned homeworks, and to better engage students tokeep up with lecture content. Interactive textbooks are more structured, contain smaller amounts of textual material, and integratemedia and assessment content. While these are an arguable improvement over traditional methods of teaching, issues of academicintegrity and engagement remain. In this work we demonstratepreliminary work on building interactive teaching modules for datastructures and algorithms courses with the following characteristics, (1) the modules are highly visual and interactive, (2) trainingand assessment are tightly integrated within the same module, withsufficient variability in the exercises to make it next to impossibleto violate academic integrity, (3) a data logging and analytic systemthat provides instantaneous student feedback and assessment, and(4) an interactive visual analytic system for the instructor to seestudents’ performance at the individual, sub-group or class level,allowing timely intervention and support for selected students.Our modules are designed to work within the infrastructure of theOpenDSA system, which will promote rapid dissemination to anexisting user base of CS educators. We demonstrate a prototypesystem using an example dataset.Instructors in lecture based courses in introductory computer science have an expectation that the material presented in class isreinforced via textbook reading, lab exercises, assigned homeworksor quizzes. Given the rapid increases in the CS population overthe past few years [39], these courses are large, and monitoringthe performance of all students and reaching out to at-risk students is a challenge. This has resulted in new ways of teachingsuch courses, broadly classified as active learning techniques, thatcan include any combination of lab-based instruction, flipped classroom settings, gamification, peer-learning, or use of multimediacontent [20, 23, 28, 33], all of which attempt to better engage students.Our focus in this work is towards building new interactive learning modules and analyzing student performance in data structuresand algorithms courses, which have exhibited significant drop ratesof 40-60% [3, 39]. These modules can be part of interactive textbooks [35, 38, 40] and help reinforce lecture content outside of class.We also present visual analytic tools that will make it easier for theinstructor to monitor and understand student performance duringthe course, thus allowing timely interventions of students whomight be falling behind, or at-risk students. Our learning modulesand visualization tools are extensions to OpenDSA [35] and theCanvas Learning Management System (LMS) [12] and have thefollowing characteristics:Interactive Modules. In contrast to the typical use of quizzes(short answers, multiple choice questions) in interactive textbooks,we propose highly interactive modules, in which students directlyinteract with the visually represented data structure or algorithms tocomplete the module. Students can practice an exercise any numberof times in the learning or training phase (to ensure master of thematerial); this is followed by an uninterrupted assessment phase.Autograding provides immediate feedback on student performanceto both the student and the instructor.Academic Integrity. The constructed modules are built withenough variability so that no two students will see the exact samedata structure or algorithm module simultaneously; this reducesacademic integrity violations, and eliminates them in controlledsituations, such as proctored exams, since neighboring students willreceive different but equivalent modules as part of their assessment.KEYWORDSinteractive textbook, student engagement, visualization, learninganalytics, data structures, algorithmsACM Reference Format:Matthew Mcquaigue, David Burlinson, Kalpathi Subramanian, Erik Saule,and Jamie Payton. 2017. Visualization, Assessment and Analytics in DataStructures Learning Modules. In Proceedings of ACM Conference, Baltimore,MD, USA, Feb. 2018 (SIGCSE ’18), 7 ssion to make digital or hard copies of all or part of this work for personal orclassroom use is granted without fee provided that copies are not made or distributedfor profit or commercial advantage and that copies bear this notice and the full citationon the first page. Copyrights for components of this work owned by others than ACMmust be honored. Abstracting with credit is permitted. To copy otherwise, or republish,to post on servers or to redistribute to lists, requires prior specific permission and/or afee. Request permissions from [email protected] ’18, Feb. 2018, Baltimore, MD, USA 2017 Association for Computing Machinery.ACM ISBN 978-x-xxxx-xxxx-x/YY/MM. . . UCTION

SIGCSE ’18, Feb. 2018, Baltimore, MD, USA Matthew Mcquaigue, David Burlinson, Kalpathi Subramanian, Erik Saule, and Jamie PaytonStudent Performance: Instructor and Student Views. To analyze all of the data generated by the learning modules and regularassignments, we present visual analyses tools that communicatestudent performance on a daily or weekly basis to the instructor atmultiple levels: individual student, selected subgroups of students,or of the entire class. Instructors can focus on specific groups ofstudents (at-risk students, for instance), and ensure they get thesupport and attention via teaching assistants, tutors or direct interaction. Analytics is provided to help the instructor make sense of allthe grading data, thanks to a biclustering algorithm that identifieshighly correlated subsets of the grades. A limited version of thesevisualizations is also presented to the students.Contributions. We propose a design that focuses on learning modules of critical material in data structures and algorithms courses,using direct interaction and visualization to promote student learning. The integration of learning and assessment within a visualand interactive environment is novel. The variability in exercisesacross students completing the same modules preserves academicintegrity in most situations, while the autograding promotes scalability to large classes. The use of an open source textbook system(OpenDSA) with an existing base of users and integration with acommonly used Learning Management System (Canvas) will helpdissemination and ease of adoption. Visual dashboards of studentperformance integrated into the LMS will provide instructor newtools for timely intervention and student support. We have built aprototype of the different components of our design1 , that includesfive learning modules, and a visual exploratory tool to analyzestudent performance, powered by clustering algorithms.2RELATED WORKOur emphasis in this work spans the areas of student engagement,interactive exercises, and visualization, targeted at student performance for improved learning outcomes. Interactive textbooksincorporate some of these features in improving the student engagement, by emphasizing a more hands-on approach to learningand assessment. Two such systems that have gained prominencein recent years include zyBooks [17, 37, 40] and OpenDSA [19, 35].zyBooks is a commercial interactive textbook system that providesinteractive tools, animations and responsive questions as part of itstextbooks and has a large user base. Edgcomb et al. [17] discuss theiruse of zyBooks across multiple institutions for reading, studyingand homework assignments in introductory programming coursesand describe mechanisms to maintain academic integrity. OpenDSAis an open source interactive textbook system that uses the JSAVlibrary [24, 25] for all of its interactions and visualizations. It provides visualizations of all the basic data structures and interactioncapabilities. Our work is based on using the OpenDSA infrastructure to augment the interactive and visualization functionality, aswell as provide new visual analytic capabilities for both studentsand instructors.A large body of work has focused on visualization of algorithmsand data structures for improved student engagement [2, 10, 32].More recently, work by Burlinson et al. [11] combined the use of1 A limited version of the learning modules and visualizations can be found at 1: System Design. OpenDSA components are shown in blueand our contributions are in green.real-world data and data structure visualizations to improve student engagement. Bart et al. [5–7] have focused on curating a largenumber of interesting datasets for use in introductory courses incomputer science. The use of visual programming (e.g., Scratch andAlice) [16, 34] has shown promise for making the first programmingsteps easier and more engaging. In addition to providing a graphicalinterface for piecing together programs; these systems let studentsbuild graphically interesting programs and encourage them to explore, experiment, and play. Formal evaluations of Alice [31] haveshown increased performance and retention in the programingcourses and improved attitudes toward computing, especially forat-risk students.Researchers have also been looking into adding or exploitinglearning analytics capabilities in an effort to improve student outcomes. Carter et al. [13] analyzed the different transition sequencesand patterns that students go through when completing programming assignments and related that to student performance. Similarly, researchers have also looked at programming behavior bylooking at compilation and hint statistics, in order to understandtheir impact on student performance in exams [1, 18]. Identifyingat-risk students effectively, however, still remains a challenge. Khosravi et al. [26] applied clustering techniques to analyze patternsof summative, formative and behavioral data for a large flippedclass. Bringing sophisticated learning analytics to understandingand classifying student groups is a goal of our work.3 METHODS3.1 System DesignThe OpenDSA infrastructure [19, 35] supports robust data collection from a variety of learning artefacts, including visualizations, interactive exercises, code authoring, and traditional multiple choiceand response strategies, all of which accompany brief textual descriptions and instructions in a book instance on a Learning Management System (currently Canvas LMS is supported). Data relatedto exercise attempts, scores, and proficiencies for student work arestored as granular interaction logs as students progress throughthe visual and interactive components of a course.Fig. 1 shows a system design overview with existent OpenDSAcomponents on the left (in blue) and our contributions on the right(in green). Interactive exercises are created using the JavaScript

Visualization, Assessment and Analytics in Data Structures Learning ModulesAlgorithm Visual Library (JSAV) [24, 25], which provides buildingblocks for web-based data structure and algorithm visualizationsand an API for triggering built-in and custom logging events. Oncea textbook and its exercises have been compiled and the corresponding course is generated on an LMS using the OpenDSA framework,these events can be logged to the OpenDSA MySQL database.In addition to the new learning modules, we are in the process of adding three components to the OpenDSA framework tofacilitate flexible visualization of the collected interaction and exercise attempt data (See Fig. 1). (1) An ETL (Extract, Transform,Load) module will read recent student and course data from theOpenDSA database, compute statistical metrics and aggregationsto reshape the data, and send the resultant data to our secondarydata store. (2) The secondary database will cache the output of theETL module. This includes data related to student exercise progress,topic proficiency, and overviews of the progress of a class withinrecent chapters and modules. The database will be implementedwith MySQL. (3) A visualization module will let students examinetheir progress and proficiency and allow instructors to view relevant information about the performance of the students in theircourse. The visualization module will be implemented alongsideother OpenDSA components, using Ruby on the server side andweb technologies (HTML, CSS, JavaScript) on the client side. Thecharts in the respective visualization dashboards will be built usinga mix of Highcharts [22], a JavaScript visualization library, andD3 [15], a JavaScript library for manipulating and visualizing data.Each chart supports operations for reordering, filtering, and exploring the student data to provide timely insight into performance,and will judiciously utilize color schemes (from ColorBrewer [14])and visual metaphors wherever possible.3.2Interactive Module Creation/AssessmentOur approach to building learning modules as part of an interactivetextbook has the following design goals: Make the modules highly interactive and engaging, by usingvisual representations of the data structures that studentsdirectly interact with to master the underlying algorithms. Inaddition, modules should make students reflect on a givenalgorithm, as to what it does or what it might construct orproduce as a result. Integrate assessment as part of the interactive learning modules, as opposed to having traditional quiz or short answerstyle questions, after the exercise. Maintain academic integrity by using a combination of automatic generation of examples from each module instance(each user sees a different but equivalent example) and autograding of that instance (OpenDSA provides a large amountof flexibility for building data structures and algorithms thatcan be customized to the goals of the learning module). Support for assessment at finer scales by assessing multiple instances of an exercise that a student works through,to obtain a better estimate of the student’s mastery of thealgorithm or concept.Figs. 2 and 3 illustrate two example modules that are part of oursystem. The first module illustrates an exercise of inserting integerkeys into a binary search tree. The student is presented with aSIGCSE ’18, Feb. 2018, Baltimore, MD, USA(a)(b)(c)(d)Figure 2: Interactive exercise: Inserting elements into a binarysearch tree, (a) Initial view showing a stack of values (at the top) tobe inserted into the tree, (b) Inserting 23, after interactive identification of the path followed by the insertion algorithm, (c) Insertion of23 into its correct location, (d) Final tree after insertion of all values.The user will use mouse clicks to identify the path taken by the insertion algorithm and will repeat this exercise for each of the valuesin the stack, followed by pressing the grade button for immediateassessment.brief description of the algorithm and the required interaction;in this instance, user clicks on the involved elements (Fig. 2(b)),until the final insertion position is found (Fig. 2(c)). Fig. 2(d) showsthe final tree after all elements have been inserted into the tree.Three key features of the module are (1) the user can practice thisexercise until he/she is comfortable, (2) each instantiation of theexercise creates a new tree with a different set of keys, and (3) theuser can view a model solution of the module being completedstep-by-step to understand how the module works. After viewingthe module solution, the user must reset the module to be given adifferent set of keys. Once the user has mastered the exercise, andcompleted their final iteration of practice, the ‘grade’ button canbe pressed to get the final grade. Note that the since each tree andset of keys are generated randomly, no two students will get theexact same problem. This makes collusion between students moretime consuming since the solution of one instance will not be thesolution of an other one. We paid attention to generating similarinstances for all student (i.e., trees with good balance) to keep thedifficulty of the different instances of the problem comparable.Fig. 3 illustrates a second example. We are interested in trainingstudents to study an algorithm and illustrate its functionality or

SIGCSE ’18, Feb. 2018, Baltimore, MD, USA Matthew Mcquaigue, David Burlinson, Kalpathi Subramanian, Erik Saule, and Jamie Payton(a)(b)(c)Figure 3: From an Algorithm to a Data Structure. In this exercise,the user interactively creates the data structure constructed by thegiven algorithm. User can create nodes and place them in left orright subtree of a node. In this example a set of values are to beinserted into an initially empty binary search tree. Intentionally,this algorithm will place smaller key values on the right subtreeand larger values on the left subtree, (a) algorithm and values tobuild the tree, (b) after insertion of 4 values, (c) final tree.what outputs it might generate. In this instance, the algorithm constructs a binary search tree by inserting a set of elements, startingfrom an empty tree. However, in this instance (Fig. 3(a)), we havealtered the algorithm so that larger values will be inserted on theleft subtree and smaller values into the right subtree (this is still abinary search tree, albeit an unconventional one). Fig. 3(a) showsthe algorithm and Fig. 3(b), shows the partially constructed treeas more values get inserted into the tree, and Fig. 3(c) shows thefinal tree. For this exercise, the user is provided with interactivecapabilities to create a node, assign a node to its left or right child,all done with mouse interaction. Once the exercise is completed(all the nodes in the array (Fig. 3(a)) are processed), it is automatically graded. Similar to the earlier example, we can vary the givenalgorithm and create a number of variations on the original insertion algorithm (for instance, insert elements only on the left or theright side of the tree). The ability to have such variability has 2Figure 4: An example data set of student performance in a course.Top panel illustrates the average student performance (blue curve)across a set of assignments, quizzes and exams throughout the semester, with the darker regions representing inter-quartile (25-75%)performance. Bottom panel shows a matrix plot of the same data,sorted by their overall average grade: students on the Y axis and assignments on the X axis. The data has been anonymized and actualgrades randomized to only show the overall distribution.advantages, (1) preserves academic integrity (as students will seedifferent algorithms when working on the same module), and (2)forces students to understand and reflect on the given algorithm.In the current implementation, we have created modules for thebasic operations on binary search trees (insert, delete, find) as wellas multiple algorithms for constructing and operating on binarysearch trees. We have also begun implementing algorithms relatingto linked lists (insertion of elements).3.3Visual AnalyticsIntroductory courses in computer science are generally large, andtypically divided into a number of lab sections. Monitoring studentperformance is generally a challenge, and identifying at-risk students is an even bigger challenge. Introducing the assessed modulesprovide additional information to understand student behavior andperformance in the class. But at the finer grain, each exercise oractivity produces a grade which can create multiple grades perweek. Because the instructor of the class will be flooded with lowlevel information, we provide visualization capabilities to look atstudent performance at multiple scales, from individual student,

Visualization, Assessment and Analytics in Data Structures Learning ModulesFigure 5: The flood of data contained in a grade book can be madesense of by using analytics such biclustering algorithms. Here onebicluster extracted by the CPB algorithm is shown spanning 18 students and 4 grades. This bicluster highlights that the grades on threehomeworks and one midterm are subgroups of students, all the way to the entire class. Whilevisualizations have been used by many researchers to look at postmortem data, ranging from simple bar charts [29], and also found inmany LMSes (like Canvas) to more sophisticated charts like parallelcoordinates [18], and cluster visualizations [26].Fig. 4 illustrates two visualizations of student performance in acourse across a combination of assignments, projects and exams.The top plot illustrates average student performance with quartileperformance. The lower plot shows the data of each student in amatrix form: students on the Y axis and the assignments on the Xaxis. Our system supports sorting this visualization based on average student grades, reordering the columns to reflect comparisonacross categories (homeworks, projects, exams, etc).This flood of data can be hard to make sense of. We believe thatautomated analyses are necessary to help extend the visualizationin order to reveal patterns that are difficult to see otherwise. Assuch, we propose to make use of biclustering techniques, whichbecame very popular recently in the field of bio-informatics toanalyze the expression level of thousands of genes across hundredsof patients [27]. In our problem a bicluster is a sub-matrix of thematrix grade, composed of a subset of the students and a subsetof the grades. Algorithms for biclustering can search for differenttype of clusters, for instance OPSM [8] tries to identify high-value,low-value which could make sense when analyzing student grades.We selected in this work the CPB algorithm [9] which looks forbiclusters where rows are highly correlated and columns are highlycorrelated, according to Pearson Correlation Coefficient (PCC). TheSIGCSE ’18, Feb. 2018, Baltimore, MD, USAreasoning is that identifying a group of students who performedsimilarly across a group of activities can help understand why thestudents performed the way they did. For instance, if a biclusterslinks some students performance between an early recall-how-toprogram assignment and project with programming components,one could hypothesize that the students performance in the projectsare linked to their early programming skills. Note that a studentwho improved in programming during the semester would likelynot be part of such a cluster. If any part of the class was designed toimprove programming skills, one could hypothesize that the effortwas not successful for the students in that bicluster.Our visualization supports looking at particular biclusters returned by the algorithm. We computed biclusters using CPB on ourtest dataset looking for clusters of PCC greater than 0.95. Fig. 5shows both the matrix and the grade plots for a particular bicluster.It shows that for a group of 18 students in that class, their scorein 3 of the homeworks and the first midterm was correlated. Onceagain, it does not mean that these 18 students all performed well,or all performed poorly. It indicates there is a link between theirperformance in these 4 grades. In this particular case, the relationbetween homework 1 and 3 and the midterm is easily understoodas the midterm covered the topics of homework 1 and 3, all usingmathematical proofs and expressions. The biclusters do not indicatecausality, just correlation. But that is enough in many cases for theinstructor who knows the structure of the class to gain a betterunderstanding of grades of the class.Looking at a single bicluster can be helpful. But looking at multiple biclusters could be insightful as well. Visualizations can bedone by reordering the grade matrix to improve the locality of thebiclusters (which is done by BicAT [4] or BiCluster viewer [21]) orby representing the biclusters as a graph (which is done by ClusterMaker [30] or Furby [36]). One of the advantages of being able tosee multiple biclusters is that one can try to understand a student’sgrade by looking at the biclusters that this student participate in,or does not participate in.Hypothetically, one could see that the student’s good midtermgrade is explained by his or her good homework grade which highlights that the student understands the theoretical part of the classwell. Also, one could see that the student’s bad grade on the firstproject is correlated with bad grades in the early programming labs,showing that the student’s programming skills were poor. But alack of correlation with later projects could hint that the student’sprogramming skills improved during the semester, maybe after theinstructor suggested the student meets with the TA to strengthenhis or her programming skills.We believe that combining analytics and visualization can provide instructors tools to better understand the student population ofa class and their performance. And while we showed how biclustering could help, we believe more complex analysis and visualizationcould shed a light on effects that are hard to understand from rawgrades.4CONCLUSIONSIn this work, we have presented new learning modules that can beused as part of an interactive textbook in data structures and algorithms courses. Our modules are built to work with the OpenDSA

SIGCSE ’18, Feb. 2018, Baltimore, MD, USA Matthew Mcquaigue, David Burlinson, Kalpathi Subramanian, Erik Saule, and Jamie Paytonsystem, using JSAV library for generating the module visualizationsthat have been customized to provide a highly interactive, visualand engaging experience for students. Learning and assessmentare part of the same interaction, rather than a post-test, which iscurrent practice in interactive textbooks. We have used the JSAVlibrary to generate the modules with sufficient variability in themodule instances, to ensure academic integrity. We also illustrateexamples that demonstrate the ability to incorporate basic datastructure algorithms into these modules and the means to gradeautomatically, thus incorporating complex and foundational concepts in an interactive and more engaging environment. We alsoillustrated visual analyses of student performance data using biclustering algorithms, so as to make sense of the grades generatedby the learning modulesThe current implementation demonstrates a prototype of theparts of the system described in Fig. 1. Much work remains to bedone in connecting the various components and integrating thedata into the visualization modules, prior to routine use as partof the LMS. This will be followed by formal user studies in datastructures and algorithms courses. Finally, more work is neededin developing the visual analytics system to better understand theneeds of the instructors as well as identification of at-risk studentsfor timely intervention and support.5ACKNOWLEDGEMENTSThis work was supported by a grant from the National ScienceFoundation (DUE-1245841, DUE-1726809)REFERENCES[1] Amjad Altadmri and Neil CC Brown. 2015. 37 million compilations: Investigatingnovice programming mistakes in large-scale student data. In Proceedings of the46th ACM Technical Symposium on Computer Science Education. ACM, 522–527.[2] Ronald Baecker. 1998. Sorting out sorting: A case study of software visualization for teaching computer science. Software visualization: Programming as amultimedia experience 1 (1998), 369–381.[3] L. Barker and T. Camp. 2015. Booming enrollments - What is the impact?Computing Research News 27, 5 (may 2015). Computing Research Association.[4] S. Barkow, S. Bleuler, A. Prelic, P. Zimmermann, and E. Zitzler. 2006. BicAT: abiclustering analysis toolbox. Bioinformatics (Oxford England) 22, 10 (2006).[5] A.C. Bart, E. Tilevich, S. Hall, T. Allevato, and C.A. Shaffer. 2014. TransformingIntroductory Computer Science Projects via Real-time Web Data. In Proc. of the45th ACM Technical Symposium on Computer Science Education. 289–294.[6] Austin Cory Bart. 2016. CORGIS Datasets PRoject: The Collection of Really Great,Interesting, Situated Datasets. (2016).[7] Austin Cory Bart, Ryan Whitcomb, Dennis Kafura, Clifford A. Shaffer, and EliTilevich. 2017. Computing with CORGIS: Diverse, Real-world Datasets for Introductory Computing. ACM Inroads 8, 2 (March 2017), 66–72.[8] A. Ben-Dor, B. Chor, R. Karp, and Z. Yakhini. 2002. Discovering local structure ingene expression data: The order-preserving submatrix problem. In InternationalConference on Computational Biology. 49–57.[9] D. Bozdağ, J. Parvin, and Ü Çatalyürek. 2009. A Biclustering Method to DiscoverCo-regulated Genes Using Diverse Gene Expression Datasets. In Proc. of 1st Int’l.Conf. on Bioinformatics and Computational Biology (Lecture Notes in ComputerScience), Springer (Ed.), Vol. 5462. 151–163.[10] Marc H Brown and Robert Sedgewick. 1984. A system for algorithm animation.Vol. 18. ACM.[11] David Burlinson, Mihai Mehedint, Chris Grafer, Kalpathi Subramanian, Jamie Payton, Paula Goolkasian, Michael Youngblood, and Robert Kosara. 2016. BRIDGES:A System to Enable Creation of Engaging Data Structures Assignments withReal-World Data and Visualizations. In Proceedings of the 47th ACM TechnicalSymposium on Computing Science Education (SIGCSE ’16). ACM, New York, NY,USA, 18–23.[12] Canvas. Last accessed, Aug. 2017. (Last accessed, Aug. 2017).[13] Adam Scott Carter and Christopher David Hundhausen. 2017. Using Programming Process Data to Detect Differences in Students’ Patterns of 3][24][25][26][2

interactive exercises, and visualization, targeted at student per-formance for improved learning outcomes. Interactive textbooks . A large body of work has focused on visualization of algorithms and data