This note describes how to profile programs and the UNIX server using
the GPROF facilities in Mach 3.0.

WHAT IS GPROF?
--------------

GPROF is a simple UNIX-derived facility that reveals where (and
hopefully) why a program runs slowly.  The output from a profiled
program contains two distinct kinds of information:

	1. a call graph with arc counts
	2. a PC sampling array.  
	
The call graph indicates which procedures called which, and how many
times.  

The PC sampling array indicates how many times the program was
"caught" executing a certain piece of code.  During periodic sampling,
the program is interrupted at regular intervals (generally once every
system clock interrupt), and the profiled program's PC is recorded.
Other kinds of sampling are event driven. For example, it is possible
to profile a program to determine in which procedures VM activity is
occurring. In these cases, the PC sampling array indicates how many
times, say, a virtual memory page fault occurred at a certain
procedure (really instruction) within the program.



WHAT DO I NEED TO USE GPROF UNDER MACH?
---------------------------------------

First off, you need an i386-based machine or a MIPS-based machine,
running Mach 3.0.  There is currently no profiling support on Suns,
Alphas, Snakes, or RS6000s.

Complete Mach profiling support includes:

	 1. a version of gcc that assists in the generation of the
	 call graph. Most versions of gcc can be forced to emit a call
	 to a special routine "mcount" upon entry to each procedure.
	 The version of gcc distributed with Mach (most recently, gcc
	 2.3.3) emits proper calls to mcount when compiling programs
	 using the "-pg" option.
	 
	 2. a version of the Mach kernel that exports a PC sampling
	 interface.  General profiling support was introduced in MK83.
	 If you are running with a kernel that predates MK83,
	 profiling may not work for you.
	 
	 3. Mach's libprof1.a from the standard Mach 3.0 libraries.
	 This library should be linked into every program compiled
	 with "-pg." It includes a background sampler thread that
	 collects PC samples which are stored in the Mach kernel. It
	 also exports an RPC interface that can be used by a
	 controlling program to enable and disable profiling of a
	 profiled program during execution.  With this,
	 long running programs (such as the Unix server) can be
	 selectively profiled without recompilation.
	 
	 4. monctl, the monitor controlling program. This program
	 interfaces to programs compiled with -pg (and linked with
	 libprof1.a) and provides a way to control profiling
	 dynamically from the command line, rather than from within
	 the program.
	 
	 5. gprof, a program that transform the output of a profiling
	 session into a human-readable form that shows how the
	 program's time is being spent, and how many times different
	 procedures are called (and from where).
	 
	 
WHAT KINDS OF EVENTS CAN I SAMPLE?	 
-----------------------------------

GPROF relies on the kernel to collect PC samples.  A PC sample is
generated during selected kernel events, where the event types are
called flavors.  A PC sample is a record that, among other things,
contains the event motivating the sample.  Currently, these kernel
events can generate a sample (defined in pc_sample.h):

	SAMPLED_PC_PERIODIC:
	 	Every clock interrupt the kernel records the
		PC of the currently running user thread if
		that thread (or its task) is sampling
		periodically.
		
	SAMPLED_PC_VM_ZFILL_FAULTS:
	SAMPLED_PC_VM_REACTIVATION_FAULTS:		
	SAMPLED_PC_VM_PAGEIN_FAULTS:
	SAMPLED_PC_VM_COW_FAULTS:
		Every time a sampled user thread takes a 
		[zero fill|reactivation|pagein|copy-on-write]
		fault, a sample entry is generated. 
		
	SAMPLED_PC_VM_FAULTS_ANY:
		Every time a sampled user thread takes ANY kind of a
		VM fault, a sample entry is generated.
		


These flavors can be OR'd together: for example, the flavor
SAMPLED_PC_VM_FAULTS is defined as:
  
		(SAMPLED_PC_VM_ZFILL_FAULTS | \
		 SAMPLED_PC_VM_REACTIVATION_FAULTS |\
		 SAMPLED_PC_VM_PAGEIN_FAULTS |\
		 SAMPLED_PC_VM_COW_FAULTS )



Although the kernel returns the sample flavor with each PC record, the
current profiling runtime library throws this information away,
recording only that a sample was taken.  Consequently, if you mix
periodic and non-periodic samples in the same profiling session, you
will see anomalous results during the post-processing phase (example:
every non-periodic event at location X would be interpreted as a
periodic sample at location X).

				 
HOW DO I USE GPROF TO PROFILE MY PROGRAM?
-----------------------------------------


Suppose you have a program called slowprog.c that needs to run faster.
You have several options for profiling this program.


1. You want to sample an entire program from start to finish.

Recompile the entire program for profiling and profile the
entire program from start to finish.

	gcc -pg slowprog.c -o slowprog -lprof1 -lnetname -lthreads_p -lmach_p
    
The "-pg" tells gcc to emit calls to mcount.  The specified
libraries force the inclusion of the sampling routines
(libprof1.a), Mach's name server RPC interface (necessary to
allow the profiling routines to export themselves), Mach's
threads (libthreads.a) and finally, Mach's main library.
The dependencies grow from left to right. 

When you run the resulting program (slowprog), by default, you
will get a file called "gmon.out" in the program's current
working directory at program exit time.  The gmon.out file is
written when the program naturally terminates, or when it makes
an explicit call to exit.  (You can redefine the name of the
output file by setting the variable char *gprof_gmon_out to
whatever you like.)

NOTE: If your program expects to terminate simply by "returning"
from main, and not by calling exit, you may find that it
never terminates under profiling.  This is because gprof
uses some hidden background threads (CThreads) in the program's address
space to collect sampling information.  The semantics
of CThreads are such that a program returning from main does not
terminate until all of its component threads have terminated
(either returned off of their stack or called cthread_exit).
Consequently, to ensure that your program terminates correctly
upon returning from main() (if this is what you want), you should
be sure that there is an explicit call to exit() as the final
action taken by the program.

The libraries -lthreads_p and -lmach_p are just the standard Mach
thread and system libraries that have been compiled with profiling
(calls into their routines show up on the call-graph).  If you don't
want call graph information for these libraries, then you can specify
the standard -lthreads and -lmach.

2. You want to selectively sample pieces of a program's execution from
within the program.

As <1>, you must compile and link the program for profiling.  The
profiling library (libprof1.a) exports several flags and control
functions that a program may use to control profiling behavior.
Normally, profiling support kicks in during the C runtime library
initialization (crt0.o).  At startup, control in a profiled program
transfers to monstartup_crt0(), where these flags are used:


         extern int gprof_no_auto_initialization_from_crt0;
         /* if TRUE, return without initializing any profiling
          * state at program initialization, or without exporting a
          * control server.
	  */
         
         extern int gprof_no_startup_from_crt0;
 	 /* if TRUE, initialize profiling data structures, but do
          *   not begin  profiling at program startup 
	  */
         
         extern int gprof_no_control_server;	      	      
         /* if TRUE, do not automatically export an RPC interface
          * that allows an external control program to manipulate
          * profiling state. 
	  */

By default, these control variables are FALSE, so profiling is enabled
for the program's entire execution unless you say something specific.


For example, if you disable auto initialization from crt0
(somewhere in your program, you have a static definition

   	int gprof_no_auto_initialization_from_crt0 = TRUE;
	
that overrides the default), then you will need to initialize
and shutdown manually. The easiest way is through the paired
gprof_start and gprof_stop routines:

	int gprof_start(sample_pc_flavor_t flavor);
	int gprof_stop(char *mon_file_name);
   
   
For sake of discussion, assume the flavor argument is
SAMPLED_PC_PERIODIC for periodic sampling.  The call to gprof_start
will initialize all of the internal buffers and begin profiling. The
call to gprof_stop will stop sampling and dump the results to the
specified file. 

These routines are wrappers on top of two lower level functions
that you can call if you are aggressive:

	kern_return_t do_gprof_mon_switch(mach_port_t port,
   				  sampled_pc_flavor_t *flavor);
   				  
        kern_return_t do_gprof_mon_dump(mach_port_t port,
   				char **mon_data;
   				int *mon_data_cnt)
   				
These routines control the programs current monitoring state and
return the contents of the monitor buffers.  The first argument, a
mach_port_t, may be MACH_PORT_NULL as it exists primarily to satisfy
MIG. These routines are automatically exported to other applications
unless gprof_no_control_server is TRUE.  In this case, it is possible
to manually export the control server with a call to
gprof_control_server_init().  This call, whether invoked manually or
automatically, causes the calling process to register an interface of
the name "gprof.pid" (where pid is the process's UNIX pid) with the
local nameserver.  Without the "do_" prefix, the calls are RPCs to the
server at the designated port.

Each call to do_gprof_mon_switch (say, through gprof_start) that
enables sampling (even if it is already enabled) zeroes the sampling
buffer.

3. I want to control profiling from the shell.

You can use the monctl program to do this.  For example, if the
program slowprog is running as PID 712 (and, as in <1> and <2>,
compiled with -pg and linked with -lprof1 -lthreads -lmach), and you
wanted to enable periodic sampling, then you would say:

        monctl -pid 712 -on periodic

The argument to "-on" is the sampling flavor you would like to
enable. (monctl -help returns the list of flavor names).

To stop sampling, you would say:

        monctl -pid 712 -off
        
To extract the sampling information into the file /tmp/gmon.out:

        monctl -pid 712 -monfile /tmp/gmon.out

To both stop and extract, you can simply say:

        monctl -pid 712 -off -monfile /tmp/gmon.out
        
It's not necessary to stop profiling to extract.

You can find out the profiling status of a program with the -status
argument.




4. I want to control profiling of my program, but I am too lazy to
recompile it.

The -pg option forces a program to generate call graph information as
well as sampling information. Since sampling information is a "kernel
activity," it is not necessary to compile a program with -pg just to
get sampling information. The monctl program can be used to collect
sampling information.  Suppose we had our program slowprog compile
without -pg, with its binary in /tmp/slowprog, and running as PID
1021. We can run monctl in "proxy mode" whereby it serves as the
sampler for our target program:

	 monctl -pid 1021 -proxy -execfile /tmp/slowprog&
	 
The -proxy specification says "export the profiling interface to the
outside world on behalf of process 1021."  The argument to -execfile
is necessary to assist monctl in determining the size of its PC
sampling array.  The & is there to force the program into the
background.


Use of monctl as a proxy is useful, but is not ideal. Although it lets
you profile any program, you ONLY get sampling information. No call
graph information is generated.  The gprof post-processor (gprof) will
report only limited information (number of samples per procedure).

The proxy is most useful when collecting non-periodic events, for
example, in determining which procedures are responsible for the
majority of a program's virtual memory activity.

The monctl proxy process terminates when the process it is profiling
terminates, or when killed with a signal.


PROCESSING THE RESULTS OF A PROFILED PROGRAM
--------------------------------------------

Once you have a gmon.out (or whatever you chose to call it), you
can use the version of gprof that is part of CMU's Mach 3.0
distribution to interpret the results. In addition to the standard
gprof features, this version of gprof support the following options:

      gprof -N --> do not assume that PC samples correspond to any
      kind of periodic tick. They are just events that should be
      propagated backwards in the call graph.
      
      gprof -r t --> for periodic samples, assume that the clock rate
      is t milliseconds. By default, the clock rate (sampling
      interval) is 15 milliseconds.
      
      gprof -x routine --> supply fine grained PC sampling information
      for samples taken within the named routine. This is useful for
      determining exactly where in a large routine events such as page
      faults are occurring.

With regards to the clock rate (sample interval), the gmon.out file
really should include this directly.  However, by leaving it out, it
is still pretty likely that your profiling output can be read by a
pristine (Ultrix, for example) version of gprof.      


HOW DO I PROFILE THE UNIX SERVER?
---------------------------------

The Unix server (UX) is just a program and can therefore be easily
profiled.  The default version of the Unix server is not compiled with
profiling (no -pg), and so can only be sampled using monctl.
Versions of UX prior to UX42 probably are not linked against
libprof1.a, so you will need to use monctl in a proxy mode:

      monctl -proxy -pid 0 -execfile /../../mach_servers/startup&

will start monctl as a proxy, but will not enable sampling.  To enable
sampling, you can either include a flavor specifier above (as "-on
periodic", for example), or you can, once having started the proxy,
control sampling as you would do normally:

      monctl -pid 0 -on periodic
      
      monctl -pid 0 -off
      
      monctl -pid 0 -off monfile /tmp/uxmon.out
      
Since the UX server never terminates, the only way to terminate the
proxy process is with a signal.

If you are running a version of UX linked for profiling, then you
shouldn't need (and won't be able) to run the proxy yourself.  In this
case, you can just instantiate profiling using the direct monctl
commands with target pid 0.  

As a note, UX does not export its profiling control interface through
the standard nameserver interface (netname).  A UNIX process (for
example monctl) invokes the UX control interface using its own
bootstrap port (which in the case of a UNIX process, is the port on
which it ultimately makes UNIX system calls).  There are two reasons
for this non-uniformity:

	 1. UX does not depend on any nameserver to advertise its
	 profiling control interface.
	 
	 2. UX can define the access points to the profiling control
	 routines in its bsd*server.defs interface, allowing it to
	 use the same dispatch routines (and threads) as do all other
	 interfaces into UX.
	 
The routines exported by UX (bsd_gprof_mon_switch and
bsd_gprof_mon_dump) are passthroughs to the libprof1.a routines
(do_gprof_mon_switch and do_gprof_mon_dump).



WHAT OTHER PROGRAMS NEED TO BE RUNNING WHEN I PROFILE?
------------------------------------------------------

Minimally, you need to be running the snames program to export a
profiling control interface.  If you run netmsgserver instead, then
your profiling interface can be accessed from remote hosts. Example:

     monctl -host bershad.pc.cs.cmu.edu -pid 124 -status
     
will return the profiling status from process 124 on host
bershad.pc.cs.cmu.edu, assuming that both the invoking and target
hosts are running the netmsgserver.     
     


WHO CONTROLS WHAT I CAN AND CAN NOT PROFILE?
--------------------------------------------

Here are the rules about profiling authorization:

     Any program can profile itself.
     
     Any program that exports its profiling interface through
     netname_check_in can be controlled by monctl from any host that 
     can do a netname_lookup on the exporting host.  This does not
     include UX, which exports its interface through a process'
     bootstrap port.  As a result, monctl can only interface with it's
     local UX.
     
     Monctl can proxy for any process as long as it can obtain the
     process' task port (using the task_by_pid) call.  For regular
     user processes, this means that monctl -proxy must be run by
     either root or the process' owner.  For UX, monctl -proxy must be
     run by root or anyone in kmem group (group 2).  As task_by_pid is
     not normally exported across hosts, this also means that one can
     not proxy for a process on another host.

     
     
WHAT KIND OF PROBLEMS CAN I EXPECT WHEN PROFILING?
--------------------------------------------------

Here's a short list of the kinds of things that can go wrong which make
a profiled execution slightly unfaithful to the unprofiled one:

  1. My program never terminates.
	This is probably because you have forgotten to include an
	explicit call to exit() before returning from main().
	Fix this by calling exit() as the last thing you do.
	
  2. My program fails on startup.
	Something is wrong.  If you see:
	   gprof_control_server_init netname_check_in (ipc/mig) server died
	then the problem is that you are not running a nameserver.
	Start up snames or netmsgserver.
	
  3. I never get a gmon.out file.
        Check to make sure that your program is terminating in a
	writable directory (generally, this is your current working
	directory).
	
  4. There are all sorts of strange routines showing up at the top of
     my profiling results.
        These are probably calls to mcount and mcountaux, which are
	the routines that maintain gprof's internal call graph.  Don't
	worry -- when you run your program without profiling, these
	calls go away.  Gprof eliminates the contribution from these
	routines when computing the call graph information, but
	includes them when presenting the flat sampling distribution.
	
  5. My program seems to be spending all of its time in mach_msg_trap?
        This is an artifact of the way that periodic sampling works.
	If the system clock goes off anytime one of your sampled
	program's threads are in the kernel, the clock interrupt will
	be delayed until control returns back to user level. 
	Since most of the time one of your threads is in the
	kernel as a result of a mach_msg_trap, the appearance is that
	the routine itself is responsible for the overhead.
	
	
      
WHAT PROGRAMS/FILES ARE INVOLVED IN PROFILING?
-------------------------------------
	
crt0.o -- Mach's C runtime library. 
moncrt0.o -- copy of crt0.o; expected by gcc on the PMAX.
gcrt0.o  --  copy of crt0.o; expected by gcc on the i386.
libprof1.a -- the profiling library
libmach_p.a -- the Mach 3.0 library compiled with -pg
libthreads_p.o -- the CThreads library compiled with -pg
gprof -- the post-processing program
monctl -- the external control program




	
	

	
      
      
