Graphics Hardware (2008)David Luebke and John D. Owens (Editors)Tracy: A Debugger and System Analyzer for Cross-PlatformGraphics DevelopmentSami Kyöstilä†1Kari J. Kangas‡2 Nokia1Kari Pulli§21 NokiaResearch CenterAbstractWe describe Tracy, an offline graphics debugging and system analysis toolkit for cross-platform system and application development in mobile graphics. Tracy operates by recording graphics function calls and argument dataof unmodified applications into a trace file for offline playback, debugging, and performance analysis. In addition, traces can be edited and converted into platform-independent C files. We pay special attention to real-timeperformance; our trace compression mechanism allows interactive use of applications even when tracing long,multi-thousand-frame traces in real mobile hardware. We describe the use of the toolkit through real-world usecases such as debugging a visual error or a performance problem in an application, analyzing the applicationquality, and benchmarking a graphics engine.Categories and Subject Descriptors (according to ACM CCS): I.3.4 [Computer Graphics]: Graphics Utilities, Software Support; D.2.5 [Software Engineering]: Testing and Debugging, Debugging AidsAlthough tools exist for interactively debugging graphicsapplications on mature systems such as PCs or game consoles, many of such tools cannot be used efficiently when thetarget system is in an immature development phase or whenthe development work is distributed among different platforms. In the handheld space the development environmentis usually in a constant flux and inherently cross-platform: agame developer may be developing a game, a graphics vendor develops the engine hardware and drivers, and a handset vendor develops the system software and does the system integration. All this work often happens concurrentlyon different hardware environments and operating systems.When the game does not work as expected, it is importantto quickly pinpoint where the bug is. The source code for allthe components may not be available to any of the parties,and even if it is available, digging into the source is a timeconsuming tedious task. Therefore, tools are needed that allow quick isolation of the bug to a minimal code sequencethat can replicate the bug for any of the parties, preferablyon the platform that they primarily work on.† e-mail:[email protected]‡ e-mail:[email protected]§ e-mail:[email protected], in handheld space resources are scarce, givingrise to various performance problems. Such problems mayshow up as an uneven frame rate or unnecessary use of resources, emptying batteries sooner. Tools are needed to flagout suspect graphics engine usage patterns and to suggest1. IntroductionMobile graphics is a quickly developing area of computergraphics. New 3D APIs such as OpenGL ES and M3G[PAM 07] bring the best features of desktop APIs in a morecompact form to handheld devices. Also new 2D vectorgraphics APIs such as OpenVG and JSR 226 [Khr07,JCP06]are available for user interfaces, animations, and presentation graphics. With the help of these APIs, handheld devicesare using increasingly visual user interfaces, they have become viable gaming platforms, and navigation services andmaps grow in popularity.c The Eurographics Association 2008.

S. Kyöstilä & K. Kangas & K. Pulli / Tracy: A Debugger and System Analyzer for Cross-Platform Graphics Developmentbetter ones, and to provide a detailed view into the graphics workload to give insights where the bottlenecks are. Thestrict resource limitations apply also to the tools that are runon the mobile hardware.To address these issues we have created Tracy, a toolkitfor tracing graphics applications to facilitate graphics enginedevelopment, application debugging, and application quality estimation in a multi-platform development environment.Our system is based on intercepting graphics commands,i.e., graphics function calls and associated argument data,to a trace file, which is then analyzed with a dedicated toolin a workstation environment, surpassing the limitations ofembedded hardware. In particular, Tracy offers several keyadvantages over current solutions. Tracy shows the value of traces in the mobile graphicsthrough real use cases. Tracing is optimized for the lowperformance, low bandwidth environments. While tracing, the traces are compressed in real time which allowscreation of long traces of interactive applications in realmobile hardware. Tracy uses data-driven design to support multiple platforms and APIs. We currently support several operating systems (Windows, Linux, and Symbian) and severalAPIs (OpenGL ES, OpenVG, EGL, and APIs built on topof them such as M3G and JSR 226). Tracy allows debugging applications without requiringaccess or modifications to the source code. It convertsthe traces to platform-independent C source which canbe used easily in different systems (OS, HW). Runningthe compiled C source yields more accurate performancecharacteristics for profiling and benchmarking than interpreting trace files with a player. Tracy allows for extracting subsequences such as singleframes from longer trace files while maintaining matchingrendering output. In this process, redundant graphics commands are culled through accurate state tracking, greatlyreducing the resulting trace file size.licly available trace utilities, concentrating on tracing andanalyzing OpenGL. The flexible data-driven design of oursystem supports several graphics APIs and provides realtime trace compression. We also highlight the use of traceutilities in various real-world use cases.The Chromium system [HHN 02] captures and filters anOpenGL graphics command stream and passes it for example to a cluster of graphics workstations to parallelize andspeed up the rendering. In addition, Chromium has beenused for example to capture and modify the graphics command stream to apply stylized drawing techniques. Insteadof implementing the tracer by hand as in Chromium, we usea data-driven design which helps us to easily create a tracerfor any C-based API.A concept closely related to graphics command tracing isstate tracking. Buck et al. [BHH00] describe how they trackOpenGL state in a system used to render tiles of the framebuffer correctly on a graphics workstation cluster, and reduce the communication with lazy updates of graphics state.We opted to build our own state tracking solution to easilysupport APIs other than just OpenGL. Like Buck et al., weuse a hierarchical representation of the graphics API state.A significant difference is that we retrieve the exact function call sequence used to set up a particular graphics APIstate to guarantee that the meaning of the trace is not inadvertently modified through editing operations such as frameextraction. We also use state tracking to analyze the qualityof graphics applications and log unnecessary state changes.Finally, state tracking enables us to serialize vertex arrayand texture map data given through an unbounded array inOpenGL ES and path coordinate data in OpenVG.The basic idea of tracing graphics commands used by agraphics application was introduced by Dunwoody and Linton [DL90]. They transcoded the graphics commands into anintermediate API-independent representation. Our approachis to instead save all graphics commands into an API-specifictrace file without losing information, allowing the trace tomatch application behavior at the graphics engine level asclosely as possible.Several tools have been created for interactive graphicaldebugging in desktop PCs and game consoles. For example, PerfHUD [NVI07] from NVIDIA is a proprietary analysis tool for Direct3D in Windows. It shows many statisticsfrom rendering pipeline stages and allows pausing an application and replaying the graphics commands for a frame.Another such tool is gDEBugger [gra07] from graphicREMEDY which also allows visual debugging and strives to enable quick pinpointing of errors and performance issues inOpenGL and OpenGL ES applications. It also shows content statistics obtained from the graphics hardware. A systemby Duca et al. [DNB 05] allows debugging of OpenGL programs by storing information about graphics commands intoa relational database. In contrast to above systems, we focusentirely on offline debugging, which is usually the most viable way to debug mobile graphics engines and applicationswhile they are being developed, especially in immature systems. During debugging, we work with trace files rather thanlive applications, as a trace file contains the graphics commands causing a graphics error in a more easily usable format compared to the original application. All complex dataextraction and analysis is done as a post-process.GLTrace and GLSim [Pro01] were among the first pub-Microsoft PIX [Mic06] is another graphics debugger forWe begin with a discussion of related work (Section 2)and then describe the various components of the Tracytoolkit (Section 3). We present the key use cases in Section 4.We finish the paper with a discussion (Section 5) and conclusions including future work (Section 6).2. Related workc The Eurographics Association 2008.

S. Kyöstilä & K. Kangas & K. Pulli / Tracy: A Debugger and System Analyzer for Cross-Platform Graphics DevelopmentFigure 1: Tracer intercepts graphics commands going from the application to the graphics engine and saves them into a tracefile. The trace file is analyzed offline in the trace analyzer for purposes such as debugging graphics engines or applications.Direct3D in Windows. It allows saving graphics commandsinto a trace file from which they can then be analyzed offline.In comparison to PIX, our system adds support for multiplegraphics APIs and engines through flexible API grammardefinition and a statistics collection mechanism which is independent of the used graphics engine. We also support editing trace files to produce synthesized graphics content andconverting trace files into other formats such as platformindependent C source code.Finally, workload characterization collects statistics onthe graphics content to allow, for example, rendering timeestimation [WW03, MC99] and characterization of typicalgraphics content features [CL97]. The workload characteristics gathered by trace files provide an important input for thedesign of graphics engine architectures [SLS04, RMG 06].Our system provides access to commonly used 3D (OpenGLES) graphics content statistics, while also defining similarcontent features for 2D (OpenVG) graphics.3. Tracy ArchitectureIn this section we first describe the overall architecture ofTracy toolkit, followed by the most relevant implementationdetails. For more details, see the M.Sc. thesis based on thiswork [Kyö08].The main components of Tracy are shown in Figure 1.Tracy works by intercepting all graphics commands executed by an unmodified application using a tracer. Thegraphics commands are saved into a trace file, which canbe replayed in a trace player or passed to a trace analyzerrunning in a workstation environment. The trace analyzer allows editing of trace files and extracting raw data, such asOpenGL ES textures, from the trace file. It can also extractcontent statistics from the trace file by running it in a traceplayer with an instrumented graphics engine. The traceanalyzer provides a Python-based scripting interface whichc The Eurographics Association 2008.makes it easy to implement tools for specific trace processing needs. The tracer, trace player, and trace analyzer are nothard-coded to use a specific graphics API, but can be easilyconfigured for different APIs with a data-driven design.3.1. TracerThe purpose of the tracer is to capture application’s graphics commands into a trace file. Similar to the related worksuch as Chromium [HHN 02], our tracer is implemented asa dynamic link library (DLL), which provides an identicalinterface to the system graphics engine. This allows for tracing existing graphics applications without any source codemodifications or recompilation.3.1.1. API Structure DefinitionWe specify the grammar and the behavior of an API using an API structure definition. A code generator producesthe tracer and the trace player from the structure definition.We favored this approach over hand-written tracer and traceplayer code, since it is less error-prone and allows for supporting different APIs with ease.The most significant part of the API structure definition isthe C header file defining the API functions, argument datatypes, and enumerants. We mark a subset of these functionsas rendering, frame swapping, or API termination functions.This information is used by the tracer and the trace analyzerto choose functions which contribute to content statistics, tosegment trace files into frames, and to shut down the tracerwhen the application terminates the API.An essential section of the API structure definition is theset of rules for calculating the sizes of array parameters inAPI functions. For instance, when saving the texture datapassed to the glTexImage2D OpenGL ES function, theamount of data to be saved must be calculated from thetexture resolution and format. The API structure definition

S. Kyöstilä & K. Kangas & K. Pulli / Tracy: A Debugger and System Analyzer for Cross-Platform Graphics lue"state:metatype(class "array", size "4"):[size(condition "pname",value "GL SPOT DIRECTION",result "3")size(condition "pname",value "GL SPOT EXPONENT",result "1")size(condition "pname",value "GL SPOT CUTOFF",result "1").]}}Figure 2: API configuration directives for the glLightfv OpenGL ES function. The rules specify how the functionparameters affect the API state and how they should be serialized to the trace file. Most notably, the pname parameteris used to determine the size of the params array.provides a compact representation for specifying these array size equations. An example is shown in Figure 2. Morecomplicated cases, such as deriving the number of path coordinates to save in the vgAppendPathData OpenVGfunction, are handled by writing custom serialization C codefor the specific functions in the API structure definition. Inpractice, we found that hand-written serialization code wasneeded for only few OpenGL ES and OpenVG functions.Graphics APIs commonly define a mechanism for extending the API. Some extensions simply define new parametervalues for the existing functions in the original API. Tracing such extensions does not warrant any special consideration, unless the extension defines new parameter formats,in which case the API structure definition needs to be extended to incorporate the serialization rules or C code. However, some extensions define completely new functions. BothOpenGL ES and OpenVG use EGL to retrieve pointers to theextension functions. For the tracer to capture calls to thesefunctions, it must intercept the function pointer queries andreturn a pointer to a corresponding tracer function. These extension functions are defined through the API structure definition. Our extension mechanism works also when EGL andthe graphics APIs reside in different DLLs.The API structure definition also includes a hierarchicalstate model and a mapping from function parameters intoit. Our aim was to create a generic state modeling solution which is not limited to either OpenGL ES or OpenVG,and with enough flexibility to support foreseeable C-basedgraphics APIs such as OpenGL ES 2.0. Instead of explic-Figure 3: A state tree for storing the filtering mode for anOpenGL ES texture object, with the type nodes drawn asrounded and the value nodes as angled rectangles. The treeon the left only shows the specific elements used to store thefiltering mode, while the tree on the right also shows somealternate options for traversal.itly specifying the complete API state, the emphasis was seton modeling the dependencies between various API functions. Our state modeling mechanism is based on a hierarchical data structure called a state tree. It is a directed acyclicgraph, in which vertices represent the elements of a statestructure and edges dependencies between them.As an example, let us examine the task of choosing thefiltering mode of a texture in OpenGL ES.glBindTexture(GL TEXTURE 2D, 3);glTexParameteri(GL TEXTURE 2D, GL TEXTURE MAG FILTER,GL LINEAR);A corresponding state tree for this example is shown in Figure 3. There are two different kinds of elements in the tree:types and values. The types may have a set of concrete statevalues, one of which is marked as current.The dependencies between commands are modeled bymapping each parameter of state-modifying API function toa type or a value in the state tree using the format shown inFigure 2. For example, the param parameter of the glTexParameteri OpenGL ES function is mapped to thefollowing state path.Root Texture target Texture name Texture parameterThis state path approach can be used to describe the effects of nearly all commands in the OpenGL ES, OpenVG,and EGL APIs. The state effects of cumulative commandssuch as the vgAppendPathData OpenVG function arehandled with custom code defined in the API structure definition.The hierarchical state model is used for graphics APIstate tracking which is needed for two main purposes. First,we want to enable the tracer to save, for example, the verc The Eurographics Association 2008.

S. Kyöstilä & K. Kangas & K. Pulli / Tracy: A Debugger and System Analyzer for Cross-Platform Graphics Developmenttex array data in OpenGL ES. While the vertex data is defined by passing an array pointer to the graphics engine,the actual data used from the array is defined by the subsequent draw commands. State tracking allows us to determinewhich parts of the array are actually used by each draw command and thus to know which data to serialize into a tracefile.The second use of state tracking is to model the relativedependencies of the API functions and their parameters inthe trace analyzer. This makes it possible to extract a set offrames from a longer trace and to perform in-depth analysisof the call trace.3.1.2. Performance ConsiderationsMaintaining an acceptable level of performance while tracing applications is greatly dependent on the ability for thetracer to write out data crossing the API boundary to thetrace file at a sufficient rate. Our initial approach of usingsynchronous write operations yielded unacceptable performance in most cases. Implementing write buffering in whichthe tracer gathered a large amount of data and wrote thewhole buffer at once brought performance on average to anacceptable level. However, the synchronous buffer flushingcaused a long pause whenever the buffer became full. Wefinally implemented fully asynchronous write buffering, inwhich a dedicated worker thread collects data into a bufferarray in a round-robin fashion and flushes the filled buffersinto the output file. With this, the tracer is able to sustain sufficient write performance as long as the average data bandwidth does not exceed the capabilities of the storage device.While the actual tracing performance depends greatly onthe amount of graphics data submitted by the application, wefound that a triple-buffered configuration with 512 kilobytesper buffer works well for medium to complex OpenGL ESapplications. Finally, to deal with crashing applications westill support fully synchronous writing, which, albeit slow,guarantees that each API call is serialized to the trace file atthe time of its execution. The type of buffering and the buffersize can be defined in a run-time tracer configuration file.In addition to improving write performance, we also compress the trace on the fly by detecting repeating data structures in function arguments. In our first trials, a two-minuteOpenGL ES animation with roughly 30 000 rasterized triangles per frame generated a 250 megabyte trace file, whichwas considered too much for most embedded systems. Furthermore, due to the large amount of data being written to thetrace file, the performance of the animation was reduced toless than one frame per second. However, animated graphicsoften exhibits a high level of frame coherence, and we foundthat a very high percentage of the trace file data consisted ofrepetitive instances of identical array data. For example inOpenGL ES, the most significant source for this duplicationcomes from vertex and index arrays; textures are commonlyspecified only once.c The Eurographics Association 2008.To reduce the trace file size, we first tried to find outwhether an array had been already stored into a trace fileby calculating a message digest value for the array contentsand comparing that to the previous value. Unfortunately asimple CRC32 message digest algorithm was prone to collisions, in which the same digest value was assigned to different array data, and led to situations where modifications toarrays were not caught and written to the file. This resultedfor example in visual artifacts in the subsequent trace playback. On the other hand, a more complex MD5 algorithmwas computationally too intensive. A more complete arraytracking algorithm would make internal copies of each encountered array in order to later check whether the array hadbeen modified, although at the expense of increased memoryconsumption.We implemented a compromise where we only trackchanges to arrays that have been encountered at least twice.During the first encounter, an array is stored into a trace fileand marked as seen based on the array memory address. During the second encounter, the array is again stored in thetrace file, but a copy is also kept in RAM if available memorypermits. After this point the copy is used to check whetheror not the array contents have changed. In practice replicating the same array a maximum of two times into the tracefile yields a good compression with acceptable processingoverhead. Using this approach, the 250 megabyte OpenGLES trace file was reduced to less than 10 megabytes and theperformance of the traced animation run in a modern smartphone was improved from less than one frame per second tomore than 10 frames per second, compared to the 25 framesper second without tracing. The array tracker commonlyuses roughly the same amount of memory as the amount ofvertex, index and path data used by the application. Textureand image data is not tracked in this manner, since duplicatetextures and images are usually not reissued by applications.3.2. Trace AnalyzerThe trace analyzer is used to examine and process the tracefiles. It provides a Python-based scripting interface with support for accessing and manipulating trace data such as individual graphics commands, vertex data, OpenGL ES textures, and OpenVG images. The interface also provides extensive trace manipulation primitives such as extracting andjoining trace subsequences. It is also possible to inspect andmodify the API state at the graphics command granularityand to access content statistics extracted from the trace fileby running it in a trace player with an instrumented graphics engine. Finally, for report generation and for producingdiagrams, the interface uses HTML, matplotlib and PythonImaging Library.The scripting interface makes it easy to implement toolsfor various trace processing needs. An example of such atool is the performance checklist which is an automated expert system that looks for known performance deficiencies

S. Kyöstilä & K. Kangas & K. Pulli / Tracy: A Debugger and System Analyzer for Cross-Platform Graphics DevelopmentCommon statisticsAPI callsBuffer snapshotsOpenGL ESGeneralFigure 4: A single frame is extracted from a trace file. Thetrace analyzer uses state tracking to determine which preceding graphics commands are needed for the extractedframe to be identical to the same frame in the full trace.These commands are assembled into a state setup sequence.PrimitivesVerticesRasterizationin a trace file. Another tool converts a trace file, or a part ofit, into equivalent ANSI C source code. Finally, the contentstatistics can be condensed into summary reports providinga quick overview of the graphics content.3.2.1. Frame ExtractionTrace files commonly encompass all the graphics commandsmade by an application while it is running, and thus they typically contain tens of thousands of graphics commands andmegabytes of data. Though long traces produce more reliable statistics, only a small part of this data is often necessarywhen tracking down a bug or preparing a benchmark, making the bulk of the trace file largely superfluous. To make iteasier to focus on a particular part of a long trace file, thetrace analyzer can be used to extract a subsequence of commands from a trace to form a new smaller trace.However, simply extracting the selected graphics commands is often not enough, since the objective usually is topreserve the original rendering output of those commands.For instance, the application might have loaded a number oftextures during its initialization phase, and that texture datawill also need to be resident when the extracted set of graphics commands is played back.Extracting a single frame along with the associated statesetup sequence is illustrated in Figure 4. When a sequenceof events comprising a frame is extracted from a trace file,the analyzer calculates the effective state at the start of theframe, and the graphics commands that have been used toprepare the state are prepended to the frame. This ensuresthat the rendering output of the extracted frame will be correct. Note that the graphics commands that influence thestate may appear anywhere in the preceding trace section.The trace subsequence extraction algorithm builds on thestate modeling system described in Section 3.1.1. The algorithm is based on the observation that a graphics commandis a prerequisite for a second graphics command if the statepath associated with the first command is a prefix for any ofthe paths associated with the second call. Based on this, thealgorithm can discern between commands that are e stamp, duration, call histogram,array data traffic, frame duration, EGLconfiguration attributesColor buffer, depth buffer, stencil bufferMatrix operations, render calls, textureuploadsSubmitted, degenerate, frustum culled,backface culled, clipped, discarded, rasterizedSubmitted, transformed, viewport transformed, lit, cache accesses, cache hitsFragment count, texture fetches, averagetriangle size, discarded fragments, estimated overdrawMatrix operations, render calls, imageuploads, property reads/writesCreations, attribute reads/writesSegment count, coordinate count, tessellated polygon edges, accepted polygonedgesFragment count, estimated overdrawTable 1: Content statistics provided by the trace analyzerand the instrumented engines.for setting up a required state and those that are made redundant by other commands. For example, an application mighthave set the current clear color multiple times before the extracted command sequence. As each color setting commandcompletely overrides the previous one, only the last one isneeded to reproduce the effective state.3.2.2. Content StatisticsThe trace analyzer provides both high-level and in-depthcontent statistics from a trace file. High-level statistics, suchas the number of rendered frames per second, are calculateddirectly based on the graphics commands in the trace file,while in-depth statistics require the use of an instrumentedengine. The detailed content statistics for both OpenGL ESand OpenVG provided by our implementation are listed inTable Performance ChecklistThe performance checklist is an automated API-specific expert system in the vein of Dr. PIX in PIX [Mic06]. It verifies a trace file against a set of predefined conditions thattest for known performance deficiencies and other unwantedcall patterns, and automatically provides a rough quality estimate of the traced application. Some of the checklist itemsapply to all graphics engines, while others are specific to thecharacteristics of a certain implementation. For example, thec The Eurographics Association 2008.

S. Kyöstilä & K. Kangas & K. Pulli / Tracy: A Debugger and System Analyzer for Cross-Platform Graphics DevelopmentMipmap usageSynchronous functionsDepth buffer clearingVertex buffer object usageRenderer versionstring differentiationExistingtexturedata modificationLoadingtexturedata during framerenderingTexture data compressionTriangle strip geometryMultisample usageMipmap filtering reduces memory accesses and improves image quality, bilinear filtering is a cheap way to improveimage quality on hardware engines.Functions that cause the CPU to wait forthe GPU may have a dramatic negativeeffect on performance.Failing to clear the depth buffer mayhave a significant performance penaltyon some architectures.Using vertex buffer objects reducesmemory bus bandwidth utilization onsome architectures.Test whether the OpenGL ES rendererversion and extension strings are beingexamined for the presence of extensionsproviding better performance. For a software renderer, the complexity of graphics content should be scaled down.Modifying existing texture data is an expensive operation on most renderers.Generally texture data should be preloaded during the startup phase and onlyif needed during runtime.When supported by the hardware, texture data compression decreases memory usage and improves rendering performance.Using triangle strips reduces the need toprocess the same vertices more than onceand improve rendering performance forcomplex meshes.On hardware engines, multisampling improves image quality with only a smallperformance cost.Table 2: OpenGL ES performance checklist items.tests included in the OpenGL ES checklist are listed in Table 2. Should any of these checks fail, the analyzer p

cuted by an unmodified application using a tracer. The graphics commands are saved into a trace file , which can be replayed in a trace player or passed to a trace analyzer running in a workstation environment. The trace analyzer al-lows editing of trace files and extracting raw data, such as OpenGL ES textures, from the trace file.