Transcription

IJCSNS International Journal of Computer Science and Network Security, VOL.16 No.2, February 201677An Anatomy of Data VisualizationAbhishek Kaushik† and Sudhanshu Naithani††,Kiel University of Applied Sciences† Kurukshetra University ††SummaryAs data is being generated each and every time in the world, theimportance of data mining and visualization will always be onincrease. Mining helps to extract significant insight from largevolume of data. After that we need to present that data in such away so that it can be understood by everyone and for thatvisualization is used. Most common way to visualize data is chartand table. Visualization is playing important role in decisionmaking process for industry. Visualization makes betterutilization of human eyes to assist his brain so that datasets canbe analyzed and visual presentation can be prepared.Visualization and Data Mining works as complement for eachother. Here in this paper we present anatomy of Visualizationprocess.Key words:Information Visualization, Scientific Visualization, DecisionMaking, Graph, Chart, Xmdv tool.Fig.1. General steps in the process of Visualization.1. IntroductionIn simple worlds Visualization is a process to form apicture in order to make it easily imaginable andunderstandable for other people. With Visualization,process of Data Mining and Human Computer Interactionprovides better results for visual data analysis. Initiallyvisualization was of two types - Information Visualizationand Scientific Visualization. Scientific Visualization usedto work for scientific data with spatial component whileInformation Visualization used to work for abstract andnon-spatial data [12]. Presently visualization is facingproblems like mapping, dimensionality, and designtradeoff [13]. Visualization helps to understand patterns,trends and relationship between different components in adataset. In words of David McCandless [32] (author, datajournalist, and information designer) :- “By visualizinginformation, we turn it into a landscape that you canexplore with your eyes, a sort of information map. Andwhen you’re lost in Information, an information map iskind of useful.”Figure 1 shows the general steps in the process ofVisualization. For visualization data is collected from allthe available sources. Then possible aggregate meaning isgenerated. After that data is analyzed. After it, graphicalinterpretation of analyzed data takes place. And at last stepuser interacts with graphical interpretation.Here are basics to generate best possible visualization [27]for any given data:-Manuscript received February 5, 2016Manuscript revised February 20, 2016a) Try to understand size and cardinality of the datab)c)d)given.Determine kind of information which is tocommunicate.Process visual information according to targetedaudience.Use the visual portraying best and easiest form ofgiven data for audience.2. Classes of Data Visualization TechniquesThe most common classes [2] of data visualizationtechniques are:a) Describing Datab) Viewing Relationshipc) Picturing Data (Icons, Glyphs, Color Coding)d) Temporal Visualizatione) Spatial Visualizationf) Spatio-Temporal VisualizationClass (a) tells about the dataset. Class (b) describesrelationship between observations and between variables.Class (c) maps data items into easily recognizable shapes.Class (d) describes visualization of temporal data whichchanges over time. Line graph is most suitable in this case.Class (e) describes spatial datasets which come fromvarious domains that relate data to a certain landscape [2].

78IJCSNS International Journal of Computer Science and Network Security, VOL.16 No.2, February 2016Map is utilized for this type of data. Class (f) has bothkinds of properties i.e. spatial and temporal such asanalysis of biomedical data.be used for arc diagrams to get unknown insights of thismethod.4.2 Flow Maps3. State of the ArtMing C. Hao [14] proposed Visual Analysis of MultiAttribute Data Using Pixel Matrix Displays, which isespecially useful in case of detailed information. AdamPerer [15] proposed that tight integration of statistical andvisualization techniques could speed up insightdevelopment. Zhao Kaidi [4] presented a new algorithmfor 4D data visualization. Martin Wattenberg [6] gavemethod of arc diagram for visualization. Ming C. Hoa [3]proposed two new techniques for visual analytics whichare cell based visual time series and visual content query.Doantam Phan [9] presented a method for generating flowmaps using hierarchical clustering. Mohammad Daradkeh[24, 25] designed new InfoVis tool to support informeddecision-making under uncertainty and risk throughInteractive Visualisation. Kristine Amari [28] presentedtechniques and tools for recovering and analyzing datafrom volatile memory. Jarkko Venna [30] introducedNeRV (neighbor retrieval visualizer) to produces anoptimal visualization by minimizing the cost. SandroBoccuzzo [16] addressed software comprehension by acombination of visualization and audio. Pak Chung Wong[17] presented visualize association rule for text mining.Svetlana Mansmann [20] proposed an explorativeframework for OLAP data to analyzing data cubes ofvirtually arbitrary complexity. Ji Soo Yi [31] developedInfoVis tool to improve decision quality of nursing homechoice.4. Methods of Visualization (with examples)4.1 Arc DiagramArc diagram is usually used to visualize complex datawithin string such as text, music, compile code. Instructure of string there are repetitions of sub-string mostof the time, which is a good thing as point of view ofvisualization because these repetitions can be used asprediction units for the visualization process. For examplein any given article there will be repetition of words andphrases. Martin Wattenberg [6] described arc diagramvisualization to process string by using pattern matchingalgorithm to find repeated substring and furtherrepresenting them visually as translucent arcs. Mostsignificant utilization of arc diagram is in the field ofmusic which is to reveal structure in compositions ofmusic [6]. Other utilization fields for arc diagrams are webpages, compiled codes, and nucleotide sequence fromDNA etc. In future other pattern matching algorithms canAs name suggests flow maps show the flow of any processi.e. how particular process is flowing. For example whenpeople migrate from one country to another, a flow mapcan show this very easily. Doantam Phan [9] presented amethod to generate flow map using hierarchical clusteringwhich is inspired by graph layout algorithm. In a handwritten flow map intelligent distortion of positions,merging of edges that share destination and intelligentedge routing are most common characteristics [9]. Toachieve intelligent distortion, Doantam used layoutadjustment algorithm. For merging edges and intelligentedge routing, hierarchical clustering is used. To implementthis system an algorithm was used with following steps:a) Layout Adjustmentb) Primary Hierarchical Clusteringc) Rooted Hierarchical Clusteringd) Spatial Layoute) Edge Routingf) Multiple –Layer Issues (when there are multiplelayers is in the system)4.3 Graph AnalyticsIt is common and interesting topic of visualization andanalytics. The main aim of graph analytics research is tomeet real life challenges. It follows technology-applicationpair i.e. success is measured by application and not byalgorithmic criteria [11]. In the applications it always turnlesson learned into lesson applied. The main applications(real life challenges) of graph analytics are listed following[11]:a) Electric-Power-Grid Analyticsb) Social-Network and Citation Analyticsc) Text and Document Analyticsd) Knowledge Domain Analytics4.4 Voronoi TreemapsFor visualization of attributed hierarchical data Treemapsare best method. Treemaps normally has problem ofrectangular shapes limitation, which is removed by usingVoronoi Treemaps. It also enables arbitrary shapevisualization [37]. Michael Balzer [37] presentedTreemaps based on the subdivision in arbitrary polygonswhich eliminate rectangle limitation because of recursivepattern. In this system following steps are repeated againand again:a) According to top hierarchy level polygonalsubdivision of display area is created.b) output is a set of polygons representing the nodesof the top hierarchy level

IJCSNS International Journal of Computer Science and Network Security, VOL.16 No.2, February 2016794.5 Geometric Projection4.7 Hierarchical DisplayIt is a technique used for multidimensional multivariatevisualization. It can map Cartesian plane as well asarbitrary space [13]. This technique is good to detectoutliers and handle large datasets.These techniques are mainly concerned about hierarchicaldata where data space is sub divided first and then subspaces are presented in hierarchical way.Dimensional Stacking which is also known as generallogic diagrams is a technique which is result ofmodification in hierarchical axis. It separates the dataspace into 2D stacked subspaces [13].Scatterplot is used to show joint variation of 2 data itemsat x-y axes of Cartesian coordinates. It supports grouping.In case of 3 or more measures a matrix named scatter plotmatrix is produced which is a series of scatter plots todisplay possible pairing of measures that are assigned tovisualization [27].Treemap partitions the screen into several regions onbehalf of value of attribute by using hierarchicalpartitioning.Parallel Coordinates technique is used where attributesare represented by parallel vertical axes linearly scaledwithin their data range [13]. Coordinates also utilized tostudy correlations among attributes by locating points ofintersection [13]. Here limited space is available for aparallel axis. It has mainly two types i.e. circular andhierarchical.4.6 Pixel-Oriented TechniqueIt is also used to visualize multivariate data where anattribute is represented by colored pixel. In n-dimensionaldataset to represent a data item n colored pixel will be used.Recursive Pattern can influence data arrangements byusing generic recursive process. It is a query independent.Pixel Bar Chart does not aggregate data values butpresents them directly. These are derived from regular barcharts. Multi-pixel bar charts are used for highdimensional data [13].Fig. 3 Dimensional Stacking.4.8 IconographyIt maps multidimensional data item to an icon and alsoknown as icon-based techniques. The visual features varydepending on the data attribute values [13].Chernoff Faces which is most popular technique ofIconography can visualize data items in a limit. It mapsdimensional positions of a face and its properties likemouth, eyes and nose etc [13].Star Glyph is one of the many variants of glyph familyand is most widely used. Here star glyphs are used topresent data items. It is not suitable when no of data itemsare on increase. It can also be used to encode additionalinformation by combining with other glyphs [13].Shape Coding uses very small array pixels to visualizedata. An array is used to represent one item of data.Fig. 2 Pixel Visualization if 10 Dimensional Data (a is attribute)Fig. 4 Array for Shape Coding.(a1 to a16 are attributes)

80IJCSNS International Journal of Computer Science and Network Security, VOL.16 No.2, February 20164.9 Chart and GraphThese are most common, widely used and easilyunderstandable ways to visualize information for theaudience. Here are some of them:Line Graphs is also called line chart and showsrelationship of a variable to other variable. It is used fortrend tracking comparison of items within same period oftime [27].Bar Charts compare qualities of two or more groups. Toshow values bars, which can be either vertical orhorizontal, are used. When there are large no of bars andsame time bars are close together, it is not possible todetect differences between bars. That’s why differentcolors are used for representing bars [27]. It works betterwhen bars are having different range.Fig. 7 Pie ChartPie Charts are subjects of discussion because their anglesand areas cannot be easily interpreted by eyes. It is veryuseful when additional information (e.g percentage) isprovided [27] and not ideal for developing dashboards forsmall screens.5. ApplicationsHere are some of the applications of data visualization:1.Business Decision Making Process: - There areapplications of visualization in the businessdecision making process. It enables the top levelmanagement to examine vast amount of data, findcurrent markets trends, take the decision andmake strategic changes if required. Commonforms of visualization used in business decisionmaking are basic charts, status indicators, scattergraphs, bubble charts, spark line charts,geographical maps, tree maps, Pareto charts etc[29].2.Other Areas Related to Decision MakingProcess: a) Uncertainty Visualization- Uncertainty inthe information is capable of influencingdecision making.There are lots oftechniques for uncertainty visualization.b) Risk Visualization- Some problems alsohave risk in order to make decision to solvethem. According to Lipkus & Hollands,(1999), users might wish to extract thefollowing information [25] regarding risk:1) Risk magnitude (i.e., how large orsmall the risk is);2)Relative risk (i.e., comparing themagnitude of two risks);Fig. 5 Line GraphFig. 6 Bar Chart (G1 to G4 are groups)

IJCSNS International Journal of Computer Science and Network Security, VOL.16 No.2, February 2016813) Cumulative risk (i.e., observing trendsover time);4) Uncertainty (e.g., estimating amount ofuncertainty and variability or range of scores);5) Interactions among risk factors.Risk visualization uses static diagrams mostly.c) Sensitivity analysis visualization- It usesgraphs, charts, surface etc. There are veryfew techniques which can be applied tosensitivity analysis. Tornado diagram is agraphical approach which displays outcomesof local sensitivity analysis [25].3.Manipulate and Interact Directly with Data: Visualization enables users to directly interactand manipulate data unlike 1D table and chartwhich can only be viewed [38]. Real timevisualization helps to figure out reasons for lowperforms of organization and can compare it withits rivals. And then most helpful changes can bemade.4.Foster a New Business Language: Visualization tells all the things through data.Performance indicator does not tell aboutgrowing and shrinking category of business andthe reasons behind it [38]. While visualizationshows performance category wise and enable userto find reasons for it by further digging the data.5.Identify and Act on Emerging Trends Faster: Companies gather lot of data about their user bysurveys, data mining and opinion analysis.Visualization is able to track [38] emerging trendsand new opportunities for business related tothose trends.6. Xmdv ToolXmdv is one of the popular open source tools which areused for visualization process. It supports mainly 5methods [39] listed below1. Scatterplots2. Star Glyphs3. Parallel Coordinates4. Dimensional Stacking5. Pixel-oriented DisplayApplications areas of Xmdv tool includes remote sensing,financial, geochemical, census, and simulation data [39].Fig. 8 Snapshot of Xmdv Tool.References[1] Denial Keim, Gennady Andrienko, Jean-Deniel Fekete,Carsten Gorg, Jorn Kohlhammer and Guy Melacon, “VisualAnalytics: Definition, Process, and Challenges”.[2] Ilknur Icke, “Visual Analytics: A Multifaceted Overview”.[3] Ming C. Hao, Umeshwar Dayal and Daniel A.Keim,“Visual Analytics Techniques for Large Multi-AttributeTime Series Data”.[4] Zhao Kaidi, “Data Visualization”.[5] Jeffrey Heer, Michael Bostock and Vadim Ogievetsky, “ATour through the Visualization Zoo”.[6] Martin Wattenberg, “Arc Diagrams: Visualizing Structure inStrings”.[7] Daniel A. Keim, Florian Mansmann, Daniela Oelke, andHartmut Ziegler, “Visual Analytics: Combining AutomatedDiscovery with Interactive Visualizations”.[8] Daniel A. Keim and Hans-Peter Kriegel, “VisualizationTechniques for Mining Large Databases: A Comparison”IEEE Transactions on Knowledge and Data Engineering,Vol. 8, No. 6, Dec. 1996.[9] Doantam Phan, Ling Xiao, Ron Yeh, Pat Hanrahan andTerry Winograd, “Flow Map Layout”.[10] Joerg Meyer, Jim Thomas, Stephan Diehl, Brian Fisher,Daniel Keim, David Laidlaw, Silvia Miksch, Klaus Mueller,William Ribarsky, Bernhard Preim and Anders Ynnerman,“From Visualization to Visually Enabled Reasoning”Dagstuhl Seminar Nº 07291 on “Scientific Visualization”‐ July 15‐20, 2007.[11] Pak Chung Wong, Chaomei Chen, Carsten Gorg, BenShneiderman, John Stasko and Jim Thomas, “GraphAnalytics—Lessons Learned and Challenges Ahead”.[12] Melanie Tory and Torsten Moller, “RethinkingVisualization: A High-Level Taxonomy”.[13] Winnie Wing-Yi Chan, “A Survey on Multivariate DataVisualization”.[14] Ming C. Hao, Umeshwar Dayal, Daniel Keim, and TobiasSchreck, “A Visual Analysis of Multi-Attribute Data UsingPixel Matrix Displays”.[15] Adam Perer and Ben Shneiderman, “Integrating Statisticsand Visualization: Case Studies of Gaining Clarity DuringExploratory Data Analysis”.

82IJCSNS International Journal of Computer Science and Network Security, VOL.16 No.2, February 2016[16] Sandro Boccuzzo and Harald C. Gall, “SoftwareVisualization with Audio Supported Cognitive Glyphs”.[17] Pak Chung Wong, Paul Whitney and Jim Thomas“Visualizing Association Rules for Text Mining”.[18] Ping Zhang and Andrew B. Whinston, “BusinessInformation Visualization for Decision-Making Support -A Research Strategy” Proceedings of the First AmericasConference on Information Systems, August 25-27, 1995,Pittsburgh, Pennsylvania.[19] Danial A. Keim, Wolfgang Muller and Heidrun Schumann,“Visual Data Mining” Eurographics 2002.[20] Svetlana Mansmann, Florian Mansmann, Marc H. Scholland Daniel A. Keim, “Hierarchy-Driven Visual Explorationof Multidimensional Data Cubes”[21] Michael D. Lee and Rachel E. Reilly, “An EmpiricalEvaluation of Chernoff Faces, Star Glyphs, and SpatialVisualizations for Binary Data” Australasian Symposium onInformation Visualization, Adelaide, 2003. Conferences inResearch and Practice in Information Technology, Vol 24.[22] Tuan Pham, Rob Hess ,Crystal Ju, Eugene Zhang andRonald Metoyer, “Visualization of Diversity in LargeMultivariate Data Sets” IEEE Transactions on ber/December 2010.[23] Martin S. Feather, Steven L. Cornford, James D. Kiper andTim Menzies, “Experiences Using Visualization Techniquesto Present Requirements, Risks to Them, and Options forRisk Mitigation”.[24] Mohammad Daradkeh, Clare Churcher and Alan McKinnon,“Supporting Informed Decision-Making Under Uncertaintyand Risk through Interactive Visualization” Proceedings ofthe Fourteenth Australasian User Interface Conference(AUIC2013), Adelaide, Australia.[25] Mohammad Kamel Younis Daradkeh, “InformationVisualization to Support Informed Decision-Making UnderUncertainty and Risk”.[26] Stephen Few and Perceptual Edge, “Data Visualization Past,Present and Future”.[27] Justin Choy, Varsha Chawla and Lisa Whitman “DataVisualization Techniques from Basics to Big Data with SASVisual Analytics”, SAS Global Forum 2012 and SASGlobal Forum 2011.[28] Kristine Amari, “Techniques and Tools for Recovering andAnalyzing Data from Volatile Memory”, SANS InstituteInfoSec Reading Room.[29] Rebeckah Blewett, “The Importance of Data Visualizationto Business Decision Making” June 12, 2011.[30] Jarkko Venna, Jaakko Peltonen, Kristian Nybo, HelenaAidos and Samuel Kaski, “Information RetrievalPerspective to Nonlinear Dimensionality Reduction for DataVisualization”.[31] Ji Soo Yi, “Visualized Decision Making: Development andApplication of Information Visualization Techniques toImprove Decision Quality of Nursing Home Choice”.[32] White Paper on “Big Data Visualization: Turning Big Datainto Big Insights” Intel IT Center.[33] Fernanda B. Viégas and Martin Wattenberg, “Artistic DataVisualization Beyond Visual Analytics”.[34] Wolfgang Müller and Heidrun Schumann, “VisualizationMethods for Time Dependent Data- An Overview”Proceedings of the 2003 Winter Simulation Conference.[35] Jens Lüssem, Stephan Schneider and Holger Studt, “DataVisualization Techniques” University of Applied SciencesKiel, winter term 2013 / 14.[36] Ming Hao, Umeshwar Dayal, Daniel Keim and TobiasSchreck, “Multi-Resolution Techniques for VisualExploration of Large Time-Series Data” Eurogrpahics/IEEE-VGTC Symposium on Visualization (2007), pp.1-8.[37] Michael Balzer and Oliver Deussen, “Voronoi Treemaps”.[38] singdata-visualization/[39] http://davis.wpi.edu/xmdv/[40] Abhishek Kaushik is currently workingin Siemens, Germany as a Master thesisstudent. He is in the final phase ofcompleting his Masters degree inInformation Technology from KielUniversity of Applied Sciences. Beforestarting his Masters he received hisBachelor’s of Technology in ComputerScience Engineering from KurukshetraUniversity in 2012.Sudhanshu Naithani has received hisBachelor's of Technology in ComputerScience Engineering from KurukshetraUniversity in 2015. He is currentlyworking as a research assistant underAssistant Professor Ravinder Madan atManav Bharti University, Solan.

As data is being generated each and every time in the world, the importance of data mining and visualization will always be on increase. Mining helps to extract significant insight from large volume of data. After that we need to present that data in such a way so that it can be understood by ever