Debugging with Totalview andDDTLe YanUser ServicesHPC @ LSU4/4/20121

Three Steps of Code Development Debugging– Make sure the code runs and yields correct results Profiling– Analyze the code to identify performance bottlenecks Optimization– Make the code run faster and/or consume lessresources4/4/20122

Debugging Essentials Reproducibility– Find the scenario where the error is reproducible Reduction– Reduce the problem to its essence Deduction– For hypotheses on what the problem might be Experimentation– Filter out invalid hypotheses4/4/20123

Debugging Methods Write/print/printf Compiler flags– Array bound check, floating point exception etc. Debuggers– Command line: gdb– Graphic: Totalview, DDT, Valgrind, Eclipse4/4/20124

Validation Is Very Important Debuggers can tell you where the programcrashes and help you to gain betterunderstanding of the context, but They cannot detect a correctness problem So, it is always a good idea to have test caseswith known solutions against which you canvalidate your program4/4/20125

TotalView & DDT Powerful debuggers– Can be used to debug both serial and parallel programs– Support multiple languages Both supports CUDA– Supported on most architecture/platforms– Graphic user interface Totalview also has a command line interface– Numerous other features Array visualization Memory debugging 4/4/20126

Availability TotalView– 8.8.0 on Queen Bee ( totalview-8.8.0)– 8.3.0 on Queen Bee, Tezpur, Philip and Eric( totalview- DDT– 2.6 on all LONI and LSU HPC Linux clusters ( ddt2.6)4/4/20127

Preparing for a Debugging Session Compile the program with debugging turnedon and optimization turned off (-O0 –g) Add softenv keys and resoft Make sure X Windows works Submit an interactive job session4/4/20128

Working with Debuggers One can start debugging by– Starting the debugger with the executable– Debugging a core dump– Attaching to a running (or hanging) process Common debugging operations––––4/4/2012Setting up action pointsControlling the executionExamining the value of variables 9

Launching a Debugging Session Serial program– Totalview totalview executable -a program options – DDT ddt –start executable program options Parallel program– Totalview mpirun rsh –tv –np num procs host list executable program options mpirun rsh –tv –np num procs -hostfile path to hostfile executable program options – DDT ddt –start –np num procs executable programoptions 4/4/201210

TotalView GUI – Root Window Alwaysappears whenTotalView isstarted Provides anoverview of allprocesses andthreads4/4/201211

TotalView GUI – Root WindowStatus codeDescriptionBlankExitedBAt breakpointEErrorHHeldKIn kernelMMixedRRunningTStoppedWAt watchpoint4/4/201212

TotalView GUI – Process Window Appears whenTotalView is started For parallelprograms eachprocess/threadmay have its ownprocess window4/4/201213

TotalView GUI – Process Window Stack trace pane– Call stack of routines Stack frame pane– Local variables, registersand function parameters Source pane– Source code Action points, processes,threads pane– Lists of action points– Lists of processes– List of threads4/4/201214

TotalView GUI – Variable Window Can be opend bydouble-clicking on avariable name– Called “dive” inTotalview terminology Display detailedinformation of a variable One can also edit thedata here4/4/201215

DDT GUIGroup/process/thread 2012StackFrameEvaluation16

Other Ways of Starting a DebuggingSession Open a core file– Need to select an executable– Can only browse variables and evaluateexpressions since there is no active process Attach to one or more running (or hanging)processes4/4/201217

TotalView: Controlling Execution Commonly used commandsGo: start/resume executionHalt: stop executionKill: terminate debugging sessionRestart: restart a runningprogram– Next: run to next source lineWITHOUT stepping into anotherfunction or subroutine– Step: run to next source line– Out: run to the completion of afunction or subroutine––––4/4/201218

DDT: Controlling Execution Similar commands toTotalView A few more commands tomove up and down stackframe– The “align stack frames”command is useful to bringpaused processes to thesame place in the program4/4/201219

Action Points Break points stop the execution when reached– Can be conditional Barrier points synchronize a set of processes of threads Evaluation points cause a code segment to be executedwhen reached Watch points allow the programmer monitor a locationin memory– Can stop execution or evaluate an expression when itsvalue changes4/4/201220

TotalView: Break points How to set– Left click on the line number– Right click on a line - “setbreakpoint” Will appear in the action point list4/4/201221

TotalView: Evaluation Points How to set– “Tools” - “Evaluate” Execute a small segment of code atspecified location– Useful when testing on-the-fly fixes4/4/201222

TotalView: Watch Points Monitor a memory locationand stop execution when itis overwritten How to set– Right click on a variable - “Create watchpoint” Can be conditional– Example: only watch thismemory location after acertain number ofiterations4/4/201223

DDT: Breakpoints How to set– Double click on a line– Right click on a line - “Add breakpoint” Will appear in thebreakpoint list4/4/201224

DDT: Evaluation and Watch Points How to set– Right click on variable - “Add to Evaluations” or“Add to Watchs” DDT does not provideas many options forevaluation and watchpoints4/4/201225

TotalView: Diving On An Object “Diving” means“showing more detailson an object” One can dive on– Variables– Processes/threads– Subroutines Use “undive” to goback4/4/201226

TotalView: Viewing/Editing Data View values and types of variables– By hovering mouse over the variable– In stack frame– In variable window Edit variable value and type– In stack frame– In variable window4/4/201227

TotalView: Handling Arrays (1) Slicing– Display arraysubsection by editingthe slice field in thevariable window– Form [upper bound:lowerbound:stride]4/4/201228

TotalView: Handling Arrays (2) Filtering– Display array subsection byapplying a filter (filter fieldin the variable window)– Available filter options Arithmetic comparison to aconstant Comparison to NaNs and Infs Conditions can be combinedby using logic operators4/4/201229

TotalView: Handling Arrays (3) Visualization Statistics4/4/201230

DDT: Handling Arrays4/4/201231

Bugs in Parallel Programs Parallel programs are prone to the usual bugsfound in sequential programs, plus–––––4/4/2012Erroneous use of language featuresMismatched parameters, missing mandatory calls etc.Defective space decompositionIncorrect/improper synchronizationHidden serialization32

Debugging Parallel Programs Everything we talked about TotalView still works(well, almost)– Exceptions: stepping over a communication call whilethe other processes are stopped or being held Additional features– Scope of Control Commands Group/Process/Thread– Displaying message queues (MPI programs)4/4/201233

Scope of Control Commands For serial programs– Not an issue because there is only one execution stream For parallel programs, we need to decide the scope to which acontrol command applies– The process window always focuses on one process/thread– Need to set the appropriate scope when Giving control commands Setting action points– Switch between process/threads ?p /p-?and 搕 /t-?button Through the root window Through the process/thread tab4/4/201234

Process/Thread Groups Group (control): all processes and threads Group (workers): all threads that are executing usercode Rank X: current process and its threads Process (workers): user threads in the current process Thread X.Y: current thread User defined group– Group - Custom Groups, or– Create in call graph4/4/201235

Displaying Message Queues Detect– Deadlocks– Load balancingissues To access– Tools - MessageQueue Graph4/4/201236

TotalView: Displaying Call Graph Quick view of programstate– Nodes are functions– Edges are calls– Look for outliers To access– Tools - Call Graph4/4/201237

DDT: Parallel Stack View Shows a tree of functionsmerged from everyprocess in a group ofprocesses Can create processgroups based on theirlocation Very helpful when dealingwith a large number ofprocesses4/4/201238

Not Covered Memory debugging–––––Leak detectionHeap statusMemory usageMemory comparison. Command line interface Command line options4/4/201239

TotalView & DDT Powerful debuggers - Can be used to debug both serial and parallel programs - Support multiple languages Both supports CUDA - Supported on most architecture/platforms - Graphic user interface Totalview also has a command line interface - Numerous other features Array visualization Memory debugging