.TH ATOMTOOLS 5 local .SH NAME atomtools - ATOM tools .SH TOOLS \fB iprof\fR - instruction profiling tool \fB liprof\fR - instruction profiling tool at basic block level \fB pipe\fR - pipeline stall tool \fB lpipe\fR - pipeline stall at basic block level \fB syscall\fR - system call summary tool \fB memsys\fR - memory system bandwidth tool \fB lmemsys\fR - low level memory system bandwidth tool \fB io\fR - input/output summary tool \fB io2\fR - input/output summary by file name \fB unalign\fR - unaligned access tool \fB gprof\fR - gprof profiling tool \fB 3rd\fR - memory checker and leak finder (a-la-Purify) \fB pixie\fR - superset of the pixie \fB heapcheck\fR - leak detection tool .SH DESCRIPTION ATOM can be used to easily build a variety of interesting performance tools. These samples can be used directly, or as templates for building new tools. For example, to apply the iprof tool to the alvinn application, a new compile step might be added to the makefile target that creates the executable. The changes to the cc command line are underlined. .DS cc -Wl,-r -non_shared -o alvinn.rr $(OBJS) -lm ------------------ --- atom alvinn.rr -tool iprof -o alvinn.iprof .DE The alvinn.iprof executable is run with exactly the same agruments as the original application. After execution of the program, the results are placed in iprof.out. Because the program has been instrumented, the new application will run slower than the original. The slowdown varies from tool to tool based on the work required to compute the output. .SH IPROF The \fBiprof\fR tool computes instruction profile information. The output is placed in the file iprof.out. .DS Procedure Calls Instructions Percentage __start 1 149 11.004 main 1 19 1.403 printf 1 40 2.954 malloc 1 130 9.601 morecore 1 39 2.880 _doprnt 1 307 22.674 NLchrlen 12 24 1.773 ... ... ... ... Total 1354 .DE In this example the NLchrlen procedure was called 12 times but only executed 24 instructions total. This was 1.773 percent of all the instructions executed in this application. .SH LIPROF The \fBliprof\fR tool computes instruction profile information for each basic block in the program. A basic block is a section of code where instructions are guaranteed to execute sequentially. .DS ---------------------------------------------------- main: compress.c ----------------------------------------------------- Line PC Instructions Count Total 423 0x120002190 18 1 18 437 0x1200021d8 6 1 6 438 0x1200021f0 4 1 4 438 0x120002200 6 13 78 439 0x120002218 2 1 2 446 0x120002220 3 1 3 ... ... ... ... .DE The main procedure is located in the file compress.c. The forth basic block in main starts at line 438, has 6 instructions and was executed 13 times for a total of 78 instructions. .SH SYSCALL The \fBsyscall\fR tool summarizes application program calls to the OSF-1 system call interface. .DS System Call Calls Time context read 14 4.3ms 1 write 7 10.4ms 7 obreak 4 0.2ms 0 lseek 1 0.0ms 0 ioctl 2 0.1ms 0 getpagesize 1 0.0ms 0 sigaction 3 0.2ms 0 Total 15.1ms 16 .DE In this example, the write system call was executed 7 times. These calls took 10.4 milliseconds. In each of the 7 writes, the system made at least one context switch. Multiple context switches are counted only once. Over the course of the program, system calls accounted for 15.1 milliseconds. The context switch total includes both context switches detected during system calls and those detected between system calls. Both offer only lower bounds on the number of context switches that occurred during the programs execution. The tool attempts to compute the wall clock time based on a machine dependent parameter file. If the machine type is not listed in the parameter file, an error message will be printed, and the default will be to compute wall clock time based on the cycle time of the DEC 3000, Model 500 machine. .SH GPROF The \fBgprof\fR tool prints execution time profiles for application programs. The profile is based upon the procedure call graph and allows the user to answer questions of the form "How much time is spent in the procedure printf() and all its descendents?". The output file name is formed by concatenating the application name with ".gout". .DS callTime called/total parents %time self descendents called+self *name callTime called/total children -------------------------------------------------------------------- 64 865 1/1 __start 63.9 19 846 1+0 *main 46 620 1/1 printf 17 226 1/1 malloc -------------------------------------------------------------------- .DE This example illustrates the output for the procedure main(), designated by the asterisk. The main() procedure had one caller: the procedure __start(). The execution of main() took 19 cycles. The execution of main()'s descendents took 846 cycles. In total the program spent 63.9% of its time in main() and main()'s descendents. Main() made two procedure calls, one to printf() and one to malloc(). printf() and the descendants of printf() executed 620 cycles on behalf of main(). malloc() and the descendants of malloc() executed 226 cycles on behalf of main(). .TP Execution Time Estimates Estimates of user execution time are based upon the 21064's pipeline model. The estimates assume that there are no cache misses, no page faults, no TLB misses and no branch mispredictions. Interactions between basic blocks is neglected. System calls must be treated specially because the kernel isn't instrumented. The execution time of each system call is measured using the cycle counter. Note that the resulting system execution times will be misleading if context switches occur. The user has the ability to exclude either the user time or the system time from the execution profile report. To exclude the user time set the environment variable GPROF to NOUSR before running atom. To exclude the system time set the environment variable GPROF to NOSYS. .TP Multiple processes When a program calls fork() an additional output file is created for the new child process. The child's output file will report only the execution time used by the child process following the fork. The parent's output file will report the execution time of the parent process both before and after the fork. The child's output file name is made unique by appending the child's process identifier. If a process calls exec() and the exec() succeeds then all execution time statistics from the creation of the process up to the exec() are lost. This occurs because the exec() overwrites the address space. However if the new program being exec'd is also instrumented then the execution time of the process following the exec() will be reported in that new program's output file. If a program calls exec() then the environment varible ATOMUNIQUE should be set prior to running the program. This forces programs to append their process identifiers to their output file names thereby preventing multiple instances of the same program from overwriting each other's output files. .TP Mutually Recursive Procedures The biggest difference between this program and the original version of gprof is the information reported by the parent and child entries when a procedure and its child or parent are mutually recursive. The way this profiler works is that when procedure a() calls procedure b(), the time of the call is noted. Then when procedure b() returns, the difference between the return and call times is charged to the call. In the absence of recursion this quantity represents the time spent in b() and all of b()'s descendents on behalf of a() as a result of the call. However, if a() and b() are mutually recursive then multiple calls from a() to b() may be active at the same time. In that case the call time represents the duration of the first of those calls. Another quirk of this program is that if a call is to a procedure which is already active (again, because the caller and callee are mutually recursive) then that call isn't charged. This anomoly allows a more efficient implementation. .TP Algorithm Although the output format of this program was copied from the original gprof the algorithms are significantly different. A couple of improvements result. Unlike the original gprof, the amount of time spent by a child procedure on behalf of its parent is measured rather than estimated. Unlike profilers based on Pixie both the source and destination of indirect calls can be reported. The new profiler dynamically constructs the procedure call graph during the execution of the program. This allows the profiler to handle indirect calls which would otherwise be undetectable from a static analysis of the program. Nodes in the graph represent procedures and arcs between nodes represent procedure calls. During the execution of the program the profiler maintains a model of the procedure call stack. When a procedure is called the profiler pushes information identifying the called procedure onto its stack. When a procedure returns the profiler pops the top entry off its simulated stack. At the beginning of each call, if an arc from the caller to the callee doesn't already exist it is added to the procedure call graph. Execution times are accumulated both for procedures and calls. Each stack entry includes the time when it was pushed on the stack. When an entry is popped off the stack the time difference between when it was popped and pushed is used to increment the total time of the called procedure and the total time of the call. The total time of a procedure represents the time spent in that procedure and all its descendents. The total time of a call represents the time spent in the callee and its descendents as a result of the call. In addition, just before every procedure call or return, the amount of time since the last procedure call or return is added to the self time of the procedure at the top of the stack. The self time represents the time spent executing instructions in that procedure (excluding descendents). An additional test is performed by the algorithm to avoid double counting times when there is recursion. When a procedure returns, the total time of the called procedure and the call are only incremented if there are no other stack entries for the called procedure on the stack. That is if, because of recursion, multiple calls to a procedure can be outstanding simultaneously, the profiler only times the first call. .SH INPUT/OUTPUT The \fBio\fR tool summarizes the application input and output. .DS ----------------------------------------------------------------------- Input/Output Summary ----------------------------------------------------------------------- File Desc Opens Reads Writes Seeks Bytes 3 1 4 0 1 32768 4 1 0 8 0 24824 ----------------------------------------------------------------------- Read Summary ----------------------------------------------------------------------- Size Count Bytes Time(rms) Rate 8192 4 32768 44.1 743329.6 ----------------------------------------------------------------------- Read Address Alignment ----------------------------------------------------------------------- Size Quad Long Word Byte 8192 2 0 2 0 ----------------------------------------------------------------------- Write Summary ----------------------------------------------------------------------- Size Count Bytes Time(ms) Rate 140 2 280 0.6 445803.1 499 1 499 4.1 121608.5 ... ... ... ... ... ----------------------------------------------------------------------- Write Address Alignment ----------------------------------------------------------------------- Size Quad Long Word Byte 140 2 0 0 0 ... ... ... ... ... .DE In this example file descriptor 4 was opened only once. In the 8 calls to write, 24824 bytes were written. There were 2 140 byte writes that were completed in a total of .6ms, writing at a rate of 445,803 bytes/second. The address argument for the 140 byte writes was quadword aligned. Byte rates make use of a machine dependent file that contains the processor cycle time. If the machine type is not found, an error message will be printed and the times computed based on the cycle time of the DEC 3000, Model 500. The \fBio2\fR tool summarizes the application input and output on a per open file descriptor basis listing for each open file. This includes number of reads, writes, seeks, fcntls, bytes read and written, along with file system performance in bytes per second. .SH PIPE The \fBpipe\fR tool computes instruction scheduling statistics. .DS Procedure Instructions Dual Packed Cycles CPI Pipe Stall CPct SPct __start 143 100 180 1.26 80 0.00 0.00 main 12 9 10 0.83 1 0.00 0.00 nasa7_ 756 600 1153 1.53 553 0.00 0.00 btrix_ 567418800 435975600 970553400 1.71 534577800 14.35 7.90 ... ... ... ... ... ... ... ... ----------------------------------------------------------------------------- totals 5359010342 3875780692 6763075180 1.26 2887294488 42.69 .DE In this example the btrix_ procedure executed 567,418,800 instructions. The compiler scheduled these instructions in an order that can execute in no less than 970,553,400 cycles, for an average cycles per instruction of 1.71. An extremely aggressive schedule that may not be obtainable in practice could execute these instructions in 435,975,600 cycles. Often reorganizing the source code or switching or rewritting the application in assembly language can recover some of the potential 534,577,800 pipeline stalls induced by the current instruction schedule. This difference does not indicate poor compilers. Compilers are often forced into making very conservative assumptions that inhibit optimal code scheduling. The btrix_ procedure accounted for 14.35 percent of all the cycles executed in this benchmark. If these pipeline stall were eliminated, the application would execute 7.9 per cent less instructions. The number of cycles that this tool estimates is a lower bound. This tool does not take into account stalls related to branch prediction, memory system, or stalls caused by instruction dependencies between basic blocks. .SH LPIPE The \fBlpipe\fR provides similar information, except at the basic block level. This tool has two output files. The first is lpipe.static. .DS -------------------------------------------------------------- btrix_ : Kbtrix2902.f -------------------------------------------------------------- Line PC Inst Pack Cycl CPI 3 0x12000c010 22 19 21 0.95 64 0x12000c068 8 7 7 0.88 <------+ 65 0x12000c088 2 1 1 0.50 | 65 0x12000c090 16 16 16 1.00 | 65 0x12000c0d0 44 42 44 1.00 <----+ | 91 0x12000c180 75 63 92 1.23 <>-+ | | 64 0x12000c2ac 4 4 4 1.00 ->---+ | 156 0x12000c2bc 2 2 2 1.00 | ... ... . . . ... | 342 0x12000ca38 40 26 26 0.65 <----+ | 342 0x12000cad8 60 50 167 2.78 <>-+ | | 312 0x12000cbc8 4 3 3 0.75 ->---+ | 59 0x12000cbd8 3 3 3 1.00 ->-----+ -------------------------------------------------------------- .DE Here the btrix_ procedure consists of a number of nested loops. One of the innermost loops starts at line 91 in the file kbtrix2902.f. In the disassembly, this corresponds to a program counter value of 0x12000c180. This basic block has 75 instructions. No compiler can schedule these instructions in any less than 63 cycles. The current instruction schedule takes 92 cycles. This loop is enclosed in a second loop that starts at line 65. This information is combined with information that is provided after the application finishes execution. This output is placed in the lpipe.dynamic file. .DS ---------------------------------------------------------------------- btrix_ : Kbtrix2902.f ---------------------------------------------------------------------- Line PC Inst Pack Cycl CPI Count TotalCycles PipeStalls 65 0x12000c0d0 44 42 44 1.00 453600 19958400 907200 91 0x12000c180 75 63 92 1.23 2268000 208656000 65772000 64 0x12000c2ac 4 4 4 1.00 453600 1814400 0 ... ... . . . ... ... ... ... .DE This application executed the basic block starting at line 92 2,268,000 times. Since each loop took 92 cycles, the total number of cycles spent in this loop was 208,656,000 cycles. The scheduling of this loop induced 657,720,000 pipeline stalls. As in the pipe tool, these numbers are a lower bound on the number of cycles necessary to execute these instructions. The \fBomdiag\fR command can be used to print the instruction schedule generated by the compiler. See the omdiag manual page for more information. .SH MEMSYS The \fBmemsys\fR tool simulates the memory system of a several popular Digital platforms. If the tools cannot determine the correct parameters for the machine you are running on, an error message will be printed and the default will be to simulate the application running on the DEC 3000 Model 500. .DS imisses drmisses dwrites bmisses bvictims total main 32.2 7111.7 3950.1 0.8 0.0 11094.9 exp 7.3 18.8 29.0 9.2 0.0 64.2 fscanf 0.2 0.1 34.8 0.0 0.0 35.1 _doscan 141.8 3.3 131.4 0.2 0.0 276.7 number 121.1 1.2 129.0 1.7 0.0 253.0 getcc 22.6 3.1 180.2 0.6 0.0 206.6 ungetcc 17.0 2.9 33.9 0.0 0.0 53.8 strlen 22.6 0.0 0.0 0.0 0.0 22.6 ungetc 11.3 3.9 86.2 0.1 0.0 101.6 ... ... ... ... ... ... ... Summary 684.9 7156.5 4644.0 22.0 0.1 12507.6 .DE In this example, the main procedure accounted for the majority of memory system activity. In this procedure, on-chip instruction misses accounted for only about 32.2 milliseconds of memory system activity. Read misses to the 8K byte on-chip data cache keep the memory system busy for 7.111 seconds. The 21064 chip has a write through on-chip data cache. These writes accounted for 3.95 seconds of memory system activity. Misses to the board cache that did not require victim processing (because the line had not been changed by a write operation) took .8 milliseconds. There was an insignificant amount of board cache victims (blocks that were dirty in the board cache that needed to be written to main memory to make way for a new board cache line). This tool does not measure the memory system time, it simulates it. The memory model has been simplified to increase the performance of the tool. The tool suppresses output for procedures that generated less than 100 micro seconds of memory system activity. The tool uses virtual rather than physical addresses when managing the board cache. .SH LMEMSYS The \fBlmemsys\fR tool generates similar results for each instruction cache boundary, load, and store instruction. .DS -------------------------------------------------------------------- main: backprop.c -------------------------------------------------------------------- Type Line PC imisses drmisses dwrites bmisses bvictims ... ... ... ... ... ... ... ... Load 0 0x1200082b4 0.0 1255.5 0.0 0.0 0.0 Load 0 0x1200082b8 0.0 778.4 0.0 0.0 0.0 Inst 0 0x1200082c0 0.5 0.0 0.0 0.0 0.0 Store 0 0x120008378 0.0 0.0 1590.3 0.0 0.0 Store 0 0x120008380 0.0 0.0 770.9 0.0 0.0 ... ... ... ... ... ... ... ... .DE In this example, these two load instructions generated 2.0339 seconds of data cache miss read traffic. Combining this information with the statistics gathered by the memsys tool, we find that these two load instructions were responsible for 28 percent (2.0339/7.1565) percent of the data cache miss traffic. These two store instructions accounted for almost half of all the data cache write through traffic! Misses in the on-chip instruction cache when executing the instruction at location 0x1200082c0 caused the memory system to be busy for .5 milliseconds. In the MEMSYSCOUNT environment variable is set, the tool returns absolute counts instead of times. .SH UNALIGN The Alpha AXP architecture does not have byte and short load and store instructions. These operations must be accomplished with a sequence of instructions that read a longword (32 bits) or quadword(64 bits), extract the corresponding data, modify it, replace the corresponding data inside of the quadword, and write the new value to memory. The \fBunalign\fR tool is used to compute how many of the instructions executed in each procedure are manipulating byte or short data elements. .DS Procedure UnalignInst Instructions % Proc % Unalign % Inst printf 5 40 12.500 3.205 0.369 _cleanup 25 96 26.042 16.026 1.846 fflush 23 131 17.557 14.744 1.699 _xflsbuf 7 54 12.963 4.487 0.517 _wrtchk 10 44 22.727 6.410 0.739 _findbuf 5 36 13.889 3.205 0.369 _doprnt 61 307 19.870 39.103 4.505 memcpy 6 25 24.000 3.846 0.443 fwrite 14 105 13.333 8.974 1.034 Total 156 1354 11.521 .DE In this example, the _doprnt procedure executed 61 instructions accessing, manipulating, or storing unaligned data. The procedure executed a total of 307 instructions. Thus, 19.87 percent of all the _doprnt instructions were in support of these operations. This procedure contributed 39.103 per cent of all the byte and short manipulation instructions in the program, but this was only 4.5 percent of all the instructions that were executed by the application. If all the short and byte data types where changed to long word data types, this program could potentially execute 11.521 percent less instructions. Of course, because the data size of the program will increase, this difference could be more than offset by increased memory system miss traffic, translation buffer misses, and paging. .SH Third Degree Does your C or C++ program behave unpredictably, have storage leaks or use too much memory? Give it the \fBThird Degree\fR! \fBThird Degree\fR is similar to but better than commercial programs like Purify and Sentinel which do not run on Alpha. It performs memory access checks and memory leak detection of C and C++ programs at run-time. The instrumented application is larger than the original application, runs exactly like it --just slower-- and in addition logs all errors and requested reports. This instrumented program locates most occurrences of the worst types of bugs in C/C++ programs: array overflows, memory smashing and malloc/free errors. It also helps figure out the allocation habits of your application by listing the heap and finding memory leaks. For example, a program trying to free the same object twice results in: .DS ------------------------------------------ pid=3878 ------- 3 -- Freeing already freed memory in heap at 0x140038010 at byte 0 of 32-byte block at 0x140038010 - main ex.c, line 21 This block was allocated by malloc at - Booboo ex.c, line 10 - main ex.c, line 20 This block was freed at - Booboo ex.c, line 14 - main ex.c, line 20 .DE Third degree takes as input a user specified customization file called .3rd that enables and disables error detection and reporting. The output of the tool is placed in file with .3log as a suffix. A document describing the user interface is available in /usr/lib/atom/doc/3rd.ps. .SH PIXIE The \fBpixie\fR tool performs basic block profiling just like the pixie(1) command. It produces .Addrs and .Counts files that are consumed by prof(1). When you run \fBatom\fR with the -tool pixie switch, the tool generates an .Addrs file as well as the instrumented program. When you run the instrumented program, it generates a .Counts file. You can run prof(1) on these files to generate a profiling listing. See the pixie(1) and prof(1) manpages for more information on basic block profiling. The pixie tool supports several options. You pass options to pixie by setting the environment variable PIXIE_ARGS. The options are set when you run the instrumented program. Options must be separated by whitespace such as .DS % setenv PIXIE_ARGS "-pids -sbrk" .DE This pixie tool supports the following options: .TP 10 \fB -pids This option is exactly the same as pixie(1)'s -pids option. It causes the .Counts file name to have the process ID appended. .TP \fB -sigdump This option is useful for programs that never terminate. The option specifies a signal name that the instrumented program will catch. When the program catches this signal, it writes out the .Counts file. Your program must not reset the signal handler for this signal. Specify the signal by its name like so: .DS % setenv PIXIE_ARGS "-sigdump sigusr1" .DE .TP \fB -sbrk The pixie tool must allocate a buffer to store the execution count information. This buffer is allocated before your program starts executing. Normally, the pixie tool allocates the buffer with mmap. However, if you specify -sbrk, it allocates the buffer with sbrk instead. .SH HEAPCHECK The \fBheapcheck\fR tool checks the heap memory usage of a program. It can diagnose errors such as freeing an unallocated buffer, writing to or reading from unallocated or freed memory, and reads and writes beyond the extents of an array. If such an error occurs, heapcheck reports the PC of the instruction causing the error and the associated file name and source line number. To use heapcheck, first link your program as follows: .DS cc -Wl,-r -non_shared -o program.rr $(OBJS) ld -non_shared -o program program.rr .DE This creates two versions of your program, one partially linked with preserved relocation information and one fully linked executable. It is important to have a fully linked executable that matches the partially linked version when you analyze any diagnostics reported by the heapcheck tool. Creating "program" from "program.rr" guarantees that these two versions match. Next, run atom on program.rr and specify heapcheck as the tool. .DS atom program.rr -tool heapcheck -o program.heapcheck .DE Next, execute program.heapcheck. If memory errors are detected, heapcheck writes diagnostics to the file program.heapcheck.out. The diagnostics contain file and line number information to assist in locating the error. .DS Allocation error at 0x120001a00 (file1.c, line 18): Attempt to free address 0x140006018 which was never allocated. Reference error at 0x120041904 (foo.c, line 685): Attempt to read from 0x14005bc08 which is not allocated. Address falls in a buffer that was last free'd at 0x1200410ec (foo.c, line 350). May have read off end of buffer from 0x14005bb40 to 0x14005bbef. That buffer allocated at 0x120008c78 (foo.c, line 74). May have read off beginning of buffer from 0x14005bc60 to 0x14005bcb7. That buffer allocated at 0x120008d00 (foo.c, line 84). .DE Finally, execute the program under \fBdbx(1)\fR to locate the cause of the problem in the program: .DS % dbx program .DE Look at the instruction at the reported PC. .DS % (dbx) 0x120041904/i [procedure:685, 0x120041904] ldq r24, -88(r12) .DE Set a conditional breakpoint at this instruction, such that the program stops when it's about to reference the reported memory location. In this example, the memory location is 0x14005ba88, so we want to stop when $r12-88 equals this address. .DS (dbx) stopi at 0x120041904 if $r12 == 0x14005bc08+88 [2] stopi if $r12 = 5369084936 + 88 at 0x120041904 .DE Now, run the program with any required arguments. When the debugger hits the breakpoint, you can examine the stack trace, variables, etc. to determine the cause of the error. The heapcheck tool supports several options. You pass options to heapcheck in one of two ways. One method is to set the environment variable HEAPCHECK_ARGS. The options take effect set when you run the instrumented program (prog.heapcheck). Options must be separated by whitespace such as .DS % setenv HEAPCHECK_ARGS "-unique -oldrealloc" .DE The second method for specifying options is to use atom's \fB-toolargs\fR switch. See the \fBatom(1)\fR manpage for more details. Options specified with the \fB-toolargs\fR switch are in effect every time the instrumented program executes. Heapcheck supports the following options: .TP 10 \fB -exitonerror If you specify this option and heapcheck diagnoses an error, heapcheck will cause your program to exit with -1 upon finishing. This is useful if you are running in a test suite and want a test program to fail if it detects memory allocation problems. .TP \fB -combine This option causes heapcheck to write any diagnostic messages to the specified output file, rather than to the default file name. If the specified file already exists, heapcheck appends any new diagnostics to the end. This allows the output from many application runs to be combined into a single file. .TP \fB -noround This option may be needed for detecting memory accesses that are beyond the end of an array, but within the quadword containing the last array element. On Alpha systems, the smallest granularity for accessing memory is a 32-bit longword. For efficiency, many applications access memory as 64-bit quadwords. In some cases, an application may allocate a buffer whose size is not an even multiple of 64-bit quadwords. Subsequent memory operations may access addresses that are a couple of bytes beyond the end of such a buffer. Such operations are usually safe as long as the referenced address is within the same quadword as an allocated location in the buffer. By default, heapcheck considers the size of allocated buffers to be rounded out to the next quadword address. This avoids false diagnostics when the situation described above occurs. If you specify -noround, heapcheck does not round out the size of buffers. .TP \fB -oldrealloc Old UNIX systems supported a model where you could free an allocated buffer, and then call 'realloc()' on the free'd buffer. This model was supposed to store memory more compactly. The ANSI C standard does not support this model, because you cannot realloc a free'd buffer. By default, heapcheck diagnoses this situation as an error. If you specify -oldrealloc, heapcheck will allow your program to realloc a free'd buffer if there have been no intervening allocations. .TP \fB -unique Normally, heapcheck creates an output file named ".heapcheck.out". Heapcheck creates an empty file if your program has no detected errors. If you specify -unique, heapcheck names the file .heapcheck.out., where pid is the process id. This name should be unique for each execution of the program. Additionally, heapcheck writes the program's command line arguments and environment strings to the output file. This can help you determine the inputs to a particular execution of a program. If you specify -unique and heapcheck does not detect any errors, heapcheck does not create an output file. This option is useful if you are checking a program that is executed iteratively with many different inputs. .SH DISAMBIGUATING OUPUT FILES. For most ATOM tools, setting the ATOMUNIQUE environment variable before running an instrumented application program appends the process identifier to the default output file. This option can be used when making multiple runs of the same tool or if the application spawns other instrumented processes to ensure that the output file will not be overwritten. .SH FILES /usr/lib/atom/tools/iprof.inst.c, iprof.anal.c /usr/lib/atom/tools/liprof.inst.c, liprof.anal.c /usr/lib/atom/tools/pipe.inst.c, pipe.anal.c /usr/lib/atom/tools/lpipe.inst.c, lpipe.anal.c /usr/lib/atom/tools/syscall.inst.c, syscall.anal.c, platform.h /usr/lib/atom/tools/memsys.inst.c, memsys.anal.c, platform.h /usr/lib/atom/tools/lmemsys.inst.c, lmemsys.anal.c, platform.h /usr/lib/atom/tools/io.inst.c, io.anal.c, platform.h /usr/lib/atom/tools/io2.inst.c, io2.anal.c, platform.h /usr/lib/atom/tools/gprof.inst.o, gprof.anal.o /usr/lib/atom/tools/unalign.inst.c, unalign.anal.c /usr/lib/atom/tools/3rd.inst.o, 3rd.anal.o /usr/lib/atom/tools/heapcheck.inst.o, heapcheck.anal.o /usr/lib/atom/tools/pixie.inst.o, pixie.anal.o /usr/lib/atom/doc/3rd.ps /usr/lib/atom/doc/user.ps /usr/lib/atom/doc/ref.ps .SH SEE ALSO atom(1), omdiag(1), third(1)