
Transcription
Hierarchical Visualization of Network Intrusion Detection Datain the IP Address SpaceTakayuki ITOH1,2 Hiroki TAKAKURA2Atsushi SAWADA2 Koji KOYAMADA31) Department of Information Sciences, Faculty of Science, Ochanomizu University2) Academic Center for Computing and Media Studies, Kyoto University3) Center for the Promotion of Excellence in Higher Education, Kyoto [email protected], {takakura, sawada, koyamada}@media.kyoto-u.ac.jpare often too busy to take the time to operate the GUI.1. IntroductionRecently various studies on intrusion analysis and secureIntrusion detection is an active area of research. Manynetwork management have been reported [1,2]. In addition toIntrusion Detection System (IDS) products are available, andthose, visualization of incidents is very effective forthese systems generally detect network intrusions and recordintuitively and quickly understanding their distribution.the intrusions into log files. To understand the performanceThis paper presents a new technique to visualize theand limitations of these systems, we have conducted a studycontents of huge IDS log files. The goals of the visualizationon several of the IDS products that are deployed on open andtechnique are to make the available statistics from IDSlarge-scale computer networks.systems understandable and to offer an interactive way ofWe have identified thefollowing issues: exploring detailed information. Another feature of thee-mails to networkvisualization technique is representing the distribution ofadministrators for each incident (i.e., an intrusionincidents in IP-address spaces, revealing the relevancy of therecord). They often send enormous numbers of e-mailsdistribution to the organizational structure of real society.SeveralIDSsystemssendif the network is large-scale. Moreover, it is veryThe technique first forms a four-level hierarchy ofdifficult to understand the relevancy and statistics ofcomputers, by grouping the computers according to their IPaggregate incidents by only receiving the alerts foraddresses byte-by-byte. It then visualizes the hierarchical dataindividual incidents.as bars and nested rectangles [3,4], where bars denotecomplicatedcomputers and rectangles denote groups of computers. Itcombinations of various incidents. Intelligent, intuitive,finally represents the statistics of incidents by mapping theand real-time solutions are required for an overallnumber of incidents of each computer as heights of the bars.understanding of the complicated behavior.The technique can represent the distribution of incidents inDatabases for storing IDS logs often grow huge, andlarge-scale computer networks consisting of several thousandtherefore usability of the databases is often olutions that assist the user in querying for data are The technique helps the user intuitively understand thedesirable to reduce query operations.distribution and trend of enormous numbers of incidents inGUIs of current IDS products visualize informationIP-address spaces of computer networks. It also helps thevery superficially. For example, the time sequence ofdiscovery of relevant relationships between distributions ofnumbers of incidents of the whole domain may beincidents and the organization of real society, because IPvisualized as simple bar charts or polygonal charts. Theaddresses are usually assigned according to the organizationuser may need to perform many operations to exploreof real society.detailed information, but in many cases administratorsMoreover, the technique can provide the capability to
explore detailed information about incidents for each Many computers attack one computer simultaneously.computer, by representing computers as clickable icons. This When a computer is attacked and virus is placed on thecapability assists users in exploring the detailed informationcomputer, it then turns to attack other computers. of incidents for each computer.This paper presents experimental results on visualizingOne (or more) computers attack other computers in thesame group or department.enormous numbers of real incidents, and describes what kindWith these features, the presented visualization techniqueof trends are observed from the experimental results.complements existing techniques well.2. Related works3. Hierarchical data visualization3.1 Rectangle packing for hierarchical dataMany IDS products provide detection, warning, and analysiscapabilities for incidents, but they have not completely solvedThe proposed technique applies a hierarchical datathe issues described in Section 1. Several recent worksvisualization technique presented in [3,4]. Figure 1 is animprove the issues.example of the visualization by this technique, whichOn the other hand, it is important to minimize damage epresents leaf-nodes as black square icons, and branch-nodesas rectangular borders enclosing the icons.immediately, and information visualization is a valuabletechnology for this task. As described in the sidebar“Visualization for computer network and intrusion detection”,recent works for visualization of network intrusion includethe following features: Visualization and detail-on-demand user interfacesshowing time sequences of network traffic, Filtering of error detection or unimportant maliciousaccesses from visualization results, Visual data mining for discovery of suspicious trafficpatterns from general log files, and riented display spaces.Figure 1. Example of hierarchical data visualization using arectangle packing algorithm.The technique presented in this paper can be categorized as“visualization of IP-address-oriented spaces” here theThe visualization technique places thousands of leaf-nodesdifference of this technique over existing techniques is thatinto one display space while satisfying the followingthis technique attempts to maximize the density of theconditions:information on display. This feature represents computers as small clickable icons, enabling a user interface that presentsits detail on-demand for each computer.single hierarchy of other nodes, Also, the technique is useful for discovering the behavior ofincidents relevant to the distribution of computers and theorganizational structure of real society. For example, itvisualizes the following behaviors: One computer attacks many others simultaneously.It never overlaps the leaf-nodes and branch-nodes in aIt attempts to minimize the display area requirement,and It draws all leaf-nodes by equally shaped and sizedicons.
[3] for hierarchical data visualization, but an improvedrectangle packing algorithm has been later presented in [4].Both algorithms place icons and rectangles one-by-one ontodisplay spaces, while the algorithms choose their positionsfrom multiple candidate positions.As shown in Figure 2, the improved rectangle packingalgorithm [4] applies grid-like subdivision of a display areausing extension lines of edges of previously placed rectangles.The algorithm quickly generates multiple candidate positions?for the rectangle currently being placed by referring to thegrid-like subdivision. It generates at most four candidates atthe corner of empty subspaces of the grid-like space, where: candidate positionthe current rectangle can be placed without yielding anyunnecessary gaps with previously placed rectangles. Thealgorithm then decides the position of the rectangle while itavoids overlapping the rectangle with previously placed ones,and attempts to minimize the area and aspect ratio of thewhole grid-like space. If there is no adequate candidateposition to place the rectangle, the algorithm additionallygenerates several candidate positions outside the grid-likeFigure 2. Improved rectangle packing algorithm.space, and selects one of the candidates to place the rectangle.(Upper) Previously-placed rectangles, and grid-likesubdivision of a display space. (Center) Candidate3.2 Visualization in the IP address spacepositions for placing the current rectangle. (Lower)The presented technique groups the computers according toPlacement of the current rectangle, and the update oftheir IP addresses to form hierarchical data. It first groupsgrid-like subdivision.them according to the first byte of the IP addresses. It againgroups them according to the second byte of the IP addresses,This representation style is suitable to equally visualizeand finally groups according to the third byte of the IPthousands of leaf-nodes of hierarchical data in one displayaddresses. Consequently the technique forms four-levelspace. We applied the technique to visualization of bioactivehierarchical data as shown in Figure 3(Left). The techniquechemicals [4], distribution of jobs in parallel computingvisualizes the structure of computer network by representingenvironments [5], and so on.the hierarchical data as shown in Figure 3(Right). Here, blackThe technique first packs icons, and then encloses them inrectangular borders. Similarly, it packs a set of rectangles thaticons in Figure 3(Right) represent computers, and therectangular borders represent groups of computers.We think that the technique is useful for the visualization ofbelong to higher levels, and generates the larger rectanglesthat enclose them. Repeating the process from the lowestcomputer network spaces because:level toward the highest level, the technique places all of the The technique visualizes large-scale hierarchical datadata onto the layout area. The packing algorithm for icons andcontaining thousands of leaf-nodes without overlapping,rectangles is the key technology for the visualizationand therefore it can represent thousands of computers astechnique. Itoh et al. proposed a rectangle packing algorithmclickable icons in one display space. The technique is
therefore useful as a GUI to directly explore detailedthe incidents in the following processing order:information about incidents of arbitrary computers inRDB-like data structure:large-scale computer networks.Consuming the log file, the presented technique forms a dataThe technique visualizes a hierarchy of computersstructure like a relational database (RDB), as shown in Figureaccording to their IP addresses. Therefore, it can briefly4(2). It constructs tables for time, signature IDs, securityrepresent the correlation between incidents and groupslevels, senders’ IP addresses, and receivers’ IP addresses. Theof computers in real society, because IP addresses aredata structure accelerates the aggregation of incidents.often assigned according to the structure of a realConstruction of hierarchical data:organization.Simultaneously the technique lists the IP addresses of sendersand receivers, and forms hierarchical data by referring to IPaddresses byte-by-byte, as shown in Figure 4(3).1.2.3.41.2.3.51.*.*.*1.2.*.*1.2.3.* 1.2.4.*1.2.4.6Here allthe computers described in the log file are registered in thehierarchical .*.*3.*.*.*Figure 3. (Left) Hierarchy of computers according totheir IP addresses. (Right) Illustration of visualizationresults of the hierarchical data.4. Implementation4.1 Network intrusion detection dataThe presented technique consumes the log files of acommercial IDS system (Cisco Secure IDS 4320 [6]). on technique.system detects incidents based on signatures that predefinethe typical patterns of malicious accesses. The techniqueAggregation of incidents for each computer:inputs the following items from the description of the log files,The technique then counts the total number of sending andas shown in Figure 4(1):receiving incidents for each computer. Here it can specify the IP address of a computer sending incidents.conditions, such as signature IDs, security levels, and range of IP address of a computer receiving incidents.times, to filter non-important incidents. If a signature ID is Date and time.specified, the technique counts them, referring to the Positive integer ID (signature ID) that denotes thesignature ID table. Similarly, it refers to the time or securityspecific signature.level tables if the range of time or the security level isSecurity level (1, 2, 3, 4, and 5).specified. Representation:4.2 Visualization procedureThe technique then visualizes the hierarchical data. Here itConsuming the log files, the presented technique visualizesrepresents the numbers of sending and receiving incidents for
each computer, by mapping the numbers as heights ofleaf-nodes. As shown in Figure 5, the technique represents theThe technique can control the level of detail of visualizationnumbers of sending and receiving incidents by assigningby eliminating or assigning dark colors to leaf-nodesdifferent colors. Examples shown in Figures 8 to 10 representcorresponding to low-security computers.Also, it can assign bright colors to leaf-nodes correspondingthe number of sent incidents as blue, and the number ofto computers sending or receiving pre-defined high-securityreceived incidents as red.incidents, to alert administrators of the return of knownattacks. The example shown in Figure 10 representscomputers sending or receiving high-security incidents inyellow.4.3 GUI capabilityWe developed the GUI of the presented technique as a JavaApplet. The features of the GUI are as follows.Figure 5. Illustration of visualizing the numbers ofDialog windowforincidents as heights of leaf-nodes.counting incidents:configuringconditionsforThe GUI pops up a dialog window for configuring conditionsConfiguration of high-security (or low-security)for counting incidents, including signature IDs, security levels,incidents:ranges of times, and IP addresses. Figure 6(Upper) shows anGenerally an IDS does not always provide adequate warningexample of the dialog window. Given the conditions, theof the security level of incidents because impact of incidentstechnique only counts incidents satisfying the conditions. Thestrongly depends on each computer network’s situation. TheGUI enables more focused visualization, for example:presented technique consumes the description of signature “The network was damaged during 13:05 to 13:10, so IIDs and IP addresses of experienced high-security (orwould like to visualize the distribution of incidentslow-security) incidents, as shown in Figure 4(4). Thisduring that the “The network was damaged by the specific signatures,visualization results according to his or her preferences, forso I would like to visualize the distribution of theexample:signatures,” or ''Incidents which have specific signature IDs are always “This specific computer is often problematic, so I woulderroneous or ignorable in this network'',like to visualize the distribution of incidents related to''Incidents which have specific signature IDs havethe computer by specifying its IP address.”damaged this network in the past'', andDialog window for listing incidents for specific''Incidents which have specific IP addresses of senderscomputer:have damaged this network in the past''.The GUI pops up a dialog window that displays the list ofAlso, the capability allows configuring the followingincidents for a specific computer that is the sender or thecomputers as high-security:receiver. The dialog window pops up when a leaf-node is ''Computers that sent or received more than a constantclicked, and then shows the list of incidents for the specificnumber of incidents in a constant time'', andcomputer corresponding to the clicked leaf-node. Figure''Computers whose number of sending or receiving6(Lower-left) shows an example of the dialog window.incidents drastically increases.''Dialog for listing records of typical attacks:
Reflection of previous strong attacks may be good referencesGUI to explore the details of incidents if they find maliciousfor secure management of computer network. The GUI popstraffic in the HTML report. Having the Java-based GUI popup a dialog window for displaying the list of pre-definedup with the specified time span from the HTML reportprevious strong attacks, for example:window may prove useful. “A specific computer sent an enormous number of5. Experimental resultsmalicious attacks during 12:35 to 12:40 on Feb 21.”Figure 6(Lower-right) shows an example of the dialogThis section introduces the results of the presented technique.window, listing date and time, and IP addresses of senders.We implemented the technique with Java 1.4 and MicrosoftSelecting one of the attacks from the list, the techniqueWindows XP on an IBM ThinkPad T42 (CPU 1.8GHz, RAMvisualizes the distribution of corresponding incidents. The756MB). They developed the GUI using Java Swing library,capability should be useful for administrators who want toand the drawing component using Java AWT library.share and analyze previous damages. If the display spaceFigures 8 to 10 show the visualization results of an IDS logallows displaying a larger dialog window, additionalfile used in a real network environment. Here, the numbers ofinformation, such as IP addresses of receivers, and signaturesent incidents are represented in blue, and the numbers ofIDs, is presented so that users can easily specify past attacks.received incidents are represented in red. In the all figuresSegmentsviewpoints are right sides of the bars and Figure 8 shows the time sequence of visualization resultsThe GUI can display segments connecting pairs of leaf-nodesusing the log file recorded in 6 hours, containing 61822 linescorresponding to senders and receivers. It displays theand 3984 computers. In our measurement, the implementationsegments for a specific computer when a user clicks atook 120 seconds for reading the log file, 0.6 seconds forleaf-node corresponding to the computer. This capabilityforming and visualizing the hierarchy of computers, and 7.1helps users to explore the propagation of incidents. Forseconds for recounting incidents while GUI operations.example, many incidents from the same sender concentrateFigure 8(a) shows the result of amounts of incidents in 5their attacks on a small number of receivers, or distribute theirminutes, Figure 8(b) shows the result in 5 minutes just afterattacks to large numbers of receivers. Figure 9 shows thethe time of Figure 8(a), Figure 8(c) shows the result in 5example of the segments.minutes 2 hours after the time of Figure 8(b), and Figure 8(d)shows the result in 5 minutes 2 hours after the time of Figure4.4 HTML-based reporting8(c).We developed a component to generate JPEG-format imageFigure 8(a) shows several computers that sent incidents tofiles of visualization results. We also developed a componentother several computers. It might mean that the sendersto generate HTML-based reports using the image generationrandomly searched for the targets of attacks. Figure 8(b)component. The implementation of the reporting componentshows that a sender found a specific computer as the target offrequently repeats generating image files while counting thethe attack, and the sender concentrated to send the incidentsincidents, finally generating HTML files as indices of theto it. Figure 8(c) shows that the sender shown in Figure 8(b)image files. Figure 7 shows an example of the Web pagehad been disconnected, but several new senders attempted togenerated by this function. Sharing the HTML and image files,attack several computers. One of the new receivers was in themultiple administrators can easily exchange knowledge ofdifferent department from the continuously attacked receivers.incidents to remotely manage the network.Figure 8(d) shows that some of the senders and receiversThe report itself does not support GUI capabilities describedshown in Figure 8(c) had been disconnected, but manyin the Section 4.3. Administrators should use the Java-basedcomputers in the same department received incidents in a
structure of real society.short time. The incidents might be a scan attack for theFigures 8 and 9 demonstrate that the technique visualizesspecific department.Figure 9 shows the pairs of senders and receivers by yellowinteresting behavior about multiple computers in the samesegments. When a user clicks a leaf-node on the display, thedepartment. Figure 10 is useful for finding dangeroustechnique extracts incidents that the computer correspondingincidents from thousands of computers.to the clicked leaf-node is sender or receiver, and representsIn addition to the above experiments, we think that thethe incidents as the yellow segments. The segments connecttechnique is useful for:to the same tall red bar in the upper side of the figure, from IP-address-oriented spaces,multiple blue bars in small rectangles in the center of thefigure. This example shows that many computers, in the same observing if malicious computers attack an entire discovering that computers receive attacks fromdomain or only specific IP address blocks, ordepartment denoted as a small rectangle, concentrated to sendincidents to the same computer.observing the drastic change of incident patterns inmultiple computers, where most of the attacks areFigure 10 shows an example that the technique highlightsignorable but a few others are serious.leaf-nodes corresponding to the senders or receivers ofHere,On the other hand, the technique still has the followingadministrators of the computer network used for these figuresissues, which will be the focus of future work for thedisconnected the senders or receivers of incidents 16 times inimprovement of the technique.two months, because of pernicious attacks. We found in thisScalability:enormous number of incidents that the signature IDs and IPFigures 8 to 10 shows that the technique is feasible foraddresses of the sender were identical in 5 of 16 attacks. Theyvisualization of incidents of 4000 computers, but it might bealso found in another large group of incidents that thedifficult for a user to comprehend the distribution ofsignature IDs and IP addresses of senders were identical in 3intrusions and explore detailed information if there are moreof 16 attacks. These experiences mean that same kinds ofcomputers. There are several ideas for this problem, but weincidents are often repeated for specific attack purposes.have not implemented any of them.Therefore, the presented technique can contribute to alert the1)high-securityincidentsontherealnetwork.A zooming interface can be applied to the problem. Hereadministrators by highlighting leaf-nodes corresponding towe can switch the representation into two modes:the senders or receivers of high-security incidents. Also, theoverview mode and clickable mode. The former modetechnique can register previous damages, so that users canjust represents the nest of IP addresses and highlightsselect the damages via the dialog shown in Figureinteresting areas, and the latter mode zooms into the6(Lower-right).interesting areas so that the display space can representcomputers as clickable bars.Figure 11 shows closer-up views of above visualizationresults, including receivers in Figure 8(a), and senders and areceiver in Figure 9.2)Another idea is removal of computers whose numbers ofsending/receiving incidents are zero, and packing therepresentations of remaining computers into a smaller6. DiscussionAs described in the Section 1, a feature of the presentedtechnique is the representation of: statistics of incidents for thousands of computers,display space. The idea has a problem of stability ofdisplay layout since the content of computers changesover time, but the problem can be solved by applyinglayout template presented in the Section 5 of [3]. distribution of incidents on IP-address spaces, andOcclusion: relevancy of the distribution to the organizationalThe presented technique applies 3D representation for the
statistics of incidents, but this style yields occlusions amongWe plan to prove the effectiveness of the technique bymetaphor of computers. One idea is applying multipleobserving with real network management and users. Also, thedisplays with independent viewpoints. Currently we arefollowing issues, as well as issues discussed in Section 6, willdiscussing to split the visualization technique into threebe the focus of future work based on this technique:displays per console. Another idea for minimizing the Combination with intelligent techniques, such as dataocclusions is applying the viewing optimization problem somining and knowledge management, to effectivelythat entropy of the visualization result is maximized. However,discover and alert high-security incidents.this approach may cause instability of viewing parameters. Visualization of statistics of incidents in larger timeThis is not only security-specific but also a general problemspan, such as a week or a month. We think that thefor 3D information visualization, and we think this area isnumber of incidents becomes less important as the timeripe for future work for the enhancement of the presentedspan grows, so representation of incidents should behierarchical data visualization technique.enhanced for long-term visualization.Visualization results may be crowded when the techniquedeals with long-term periods because numbers of incidents foreach computer increase. In this case the number of occludedicons may also increase. We think that this problem will beimproved by applying near-real-time observation, e.g., bycounting numbers of incidents and refreshing the display perminute.Hierarchy representation:While currently we represent hierarchy as a single color ofnested rectangular borders, it might be more effective ifdifferent colors are assigned to the borders according to thehierarchy’s depth, if understanding the relationship betweenthe distribution of intrusions and the hierarchy of anorganization is very critical.The presented technique only represents the link of trafficonly by specifying a computer, as shown in Figure 9. ThisisnotenoughVisualization focusing on time-varying distribution ofincidents. Combining our technique with time-orientedvisualization techniques, such of Mie-log (introduced as[7] in Visualization for Computer Network andIntrusion Detection sidebar) may effectively visualizethe time-varying distribution of intrusions. Some kindsof trends or attack patterns can be also discovered bydeveloping visualizations of the time-sequence ofintrusions.AcknowledgementWe appreciate the anonymous reviewers’ many fruitfulcomments on this paper.References[1] Y. D. Cai, D. Clutter, G. Pape, J. Han, M. Welge, L. Auvil,Many-to-many traffic:capability fortheunderstandingofmany-to-many traffic, for example the result shown in Figure8(a). It would be useful if the implementation automaticallydetects important links and represents them. ParameterFocusing (introduced as [4] in Visualization for ComputerNetwork and Intrusion Detection sidebar) may be a goodreference to improve the technique for this issue.MAIDS: Mining Alarming Incidents from Data Streams,SIGMOD Conference 2004, pp. 919-920, 2004.[2] S. J. Stolfo, W. Lee, P. K. Chan, W. Fan, E. Eskin, DataMining-based Intrusion Detectors: An Overview of theColumbia IDS, Project. SIGMOD Record, Vol. 30, No. 4, pp.5-14, 2001.[3] T. Itoh, Y. Yamaguchi, Y. Ikehata, Y. zation and Computer Graphics, Vol. 10, No. 3, pp.7. ConclusionThis paper presents a technique for representing the statisticsand trends of incidents in large-scale computer network.302-313, 2004.[4] T. Itoh, F. Yamashita, Visualization of multi-dimensionaldata of bioactive chemicals using a hierarchical data
icInternational 2003, pp. 162-169, 2003.Symposium on Information Visualization (APVIS) 2006, to[6] Cisco Secure IDS.be c/3/jp/product/secu[5] Y. Yamaguchi, T. Itoh, Visualization of Distributedrity/ids/index.htmlProcesses Using "Data Jewelry Box" Algorithm, CGSecurity levelRange of timeIP addressSignatureDate, time, signature ID, security level,sender, and receiver for a single incidentDateRange of timeSendersFigure 6. GUIs for visualization of IDS data. (Upper) Dialog window for configuring conditions for countingincidents. (Lower-right) Dialog window for listing incidents for specific computer. (Lower-right) Dialog for listingrecords of typical attacks.
Figure 7. Report generated as HTML and image files.
ReceiversSendersFigure 8. (a) Multiple senders and receivers are observed.
ReceiverSenderFigure 8. (b) A sender started concentrating its attack on the single receiver.
New receiverContinuouslyattackedreceiverNew senderNew senderFigure 8. (c) The sender has been disconnected: however, several other computers turned as new senders, and severalother computers received the attacks. The new senders were in the same department with the previous sender, but thenew receiver was in a different department from the continuously attacked receivers.
Many receivers in thesame departmentFigure 8. (d) Many receivers are observed in the same department. This pattern might be a scan attack for specificdepartment.
Multiple sendersFigure 9. Multiple computers in the same organization directed to send incidents to the same computer.
Figure 10. Yellow leaf-nodes represent the computers which sent or received incidents whose IDs and senders weresame as past danger incidents. This representation helps to reduce damages by notifying the past incidents.
Figure 11. Closer-up views of the visualization. (left) Receivers in Figure 8(a). (right) Senders and a receiver inFigure 9.
Visualization for Computer Network and Intrusion Detection[3] B. Cheswick, H. Burch, S. Branigan, Mapping and visualizing theVisualization of Internet and computer network
3. Hierarchical data visualization 3.1 Rectangle packing for hierarchical data The proposed technique applies a hierarchical data visualization technique presented in [3,4]. Figure 1 is an example of the visualization by this technique, which represents leaf-nodes as black square icons, and branch-nodes as rectangular borders enclosing the icons.