Date: Tue, 05 Nov 1996 00:16:27 GMT Server: NCSA/1.5 Content-type: text/html Last-modified: Thu, 12 Sep 1996 23:37:54 GMT Content-length: 54366 $Header: /u/g/l/glew/public/html/RCS/coding-standards.html,v 1.2 1995/06/22 08:43:54 glew Exp $ P6 C Coding Standards

P6 C Coding Standards

December 13, 1991
P6 Architecture Group
R. Wilkinson

History

EARLY DRAFTS - December 19, 1990; January 11, 1991
FIRST RELEASE - January 29, 1991
SECOND RELEASE - December 13, 1991 Rev. 2.0
Converted to HTML by A. Glew Thu Jun 22 1995

Introduction

This document details the standards to be followed when writing C code. It is expected to be followed by all C programmers in the Workgroup Computing Division - Portland. By promulgating these standards, we hope to address software issues related to readability and maintainability. By conforming to a common layout, it will be significantly easier for members within the group to navigate within one another's code. By not having to adjust to different coding formats, we remove a significant impediment to reading the code of others. In addition, the existence of some elements, such as standard function headers, will be an aid to others (and quite possibly the authors) in understanding what the program is doing.

Since even the simplest programs can take on lives of their own, it is recommended that these standards be followed from the earliest point of program inception. The relatively small overhead incurred initially will be more than repaid over the life of the program. It should also be noted that code reviews are intended to be part of our development methodology, and code is expected to conform to these standards to pass review.

While it is doubtful that everyone will be in agreement with all of the standards presented, it is expected that they will be followed nonetheless. In all cases, good reasons exist for all of the standards in this document. The general intent is that within (and beyond) the context of these standards, code should be easily readable and understandable by any semi-C-literate programmer. This pertains not only to the format (and legibility) of C programs, but to the existence and helpfulness of comments within the code as well.

Program Order

The ordering of sections within programs will be as follows:
  • The Intel copyright notice.
  • The RCS id declaration.
  • <overview> An overview of the file contents.
  • <includes> Any file inclusions. Only definition files (".h") should be included.
  • <defines> Macro/constant definitions.
  • typedef/struct Any type or structure definitions.
  • <externs> Any external object definitions. Use with caution.
  • <globals> All global variable declarations.
  • <statics> All static variable declarations.
  • <forwards> All forward function declarations.
  • <functions> All function declarations (including "main").
  • RCS "Log" information.
  • These are discussed below.

    <Copyright>
    To protect Intel's intellectual property, every file should have a copyright notice of the form:
    /* Copyright Intel Corporation, 1990, 1991. */
    
    Each year of development should be represented.
    <RCS_id>
    RCS header information should go at the beginning of all files. This should be of the form:
    	#ifndef lint
    	static char	*rcsid = "$Header: /u/g/l/glew/public/html/RCS/coding-standards.html,v 1.2 1995/06/22 08:43:54 glew Exp $";
    	#endif
    
    In the case of header files, the form (for a header file called chapeau.h) should be:
    	#ifndef lint
    	static char	*rcsid_chapeau_h = "$Header: /u/g/l/glew/public/html/RCS/coding-standards.html,v 1.2 1995/06/22 08:43:54 glew Exp $";
    	#endif
    
    <overview>
    This section is a (block) comment that should contain a general overview of the file's contents. What functionality does the file provide, how does it relate to other files (if part of a larger program), what are the major entry points, etc., are all appropriate questions to answer here.
    <includes>
    This section contains the "#include"s of any necessary header files.
    <defines>
    This section contains any necessary "#define"s.
    typedef/struct:
    > This section contains all typedef and/or struct definitions specified in the file.
    <externs>
    > This section contains all extern declarations specified in the file.
    <globals>
    This section contains all global variable declarations with external visibility.
    <statics>
    This section contains all global variable declarations with restricted static (local) visibility.
    <forwards>
    This section contains all necessary "forward" function/procedure declarations. (These are routines which are referenced before their actual implementation is specified.)
    <functions>
    This section contains the "body" of the code. All routines (including main()) are placed here.
    <RCS_log>
    RCS log information should be placed at the end of all files. This should be of the form:
    	/*
    	 * $Log: coding-standards.html,v $
    	 * Revision 1.2  1995/06/22 08:43:54  glew
    	 * *** empty log message ***
    	 *
    	 */
    

    Particularly in those cases where they are extensive, macro, constant, type, and structure definitions may be more effectively placed in a separate ".h" file.

    Header Files

    To avoid the potential problems caused by nested header files, the body of header files should be designed for conditional inclusion. The format of a header file called toupee.h is given below. Note the (required) use of the leading and trailing underscores.
    	#ifndef _TOUPEE_H_
    	#define _TOUPEE_H_
    		:
    	<file body>
    		:
    	#endif /* _TOUPEE_H_ */
    
    Irrespective of the above format, header files should not include variable declarations. The use of the facilities provided in the header file, p6system.h, is strongly encouraged. A copy (as of December 13, 1991) has been included in Appendix A. The file currently lives in ~p6/arch/src/util.

    Names

    The use of capital letters in names is not a matter of choice. All #defines should have all letters capitalized. This includes the definitions of both constants and macros. All elements of an enumerated type should have the first letter capitalized and all other letters lower case. All other names should consist entirely of lower case letters. Use of extraneous capital letters outside the bounds specified here require very strong justifications.

    Names should be chosen to be reasonably descriptive. Underscores ("_") should be used as separators. Names of the form GetCacheIndex (or getcacheindex) are not acceptable. If lengthening a name increases clarity and/or understandability, the more descriptive name should be chosen. If this results in longer names, so be it. (Clearly we're assuming some bounds of reason. Using "i", "j", and "k", for indices in a "for" statement is pretty straightforward, while using "the_five_bits_for_encoding_the_register_or_an_immediate_value" is obvious insanity.)

    Names that should be avoided are:

    Procedures names should reflect what they do. Function names should reflect what they return. For functions returning only TRUE/FALSE values, a predicate form is recommended (e.g. is_queue_empty(ready_queue_ptr), is_ford(car)).

    Strong encouragement is given to naming variables and parameters that are pointers in some manner that makes note of this quality. Some suggestions are:

  • black_table_ptr
  • head_p
  • tailp
  • filepp (pointer to a pointer)
  • proc_AD (for you 960 freaks - not recommended)
  • Types, variables, and routines that stand a good chance of being used outside the file in which they are contained (via "include") should have their names prefixed with some string that will aid in finding them. Some examples would be 'btb_...' for popular "branch target buffer" entities and 'dfa_...' for items from the "data flow analyzer" that may experience a wider audience.

    Macros

    Macros provide a convenient mechanism for textual substitution. As a result of this, it is easy to introduce subtle bugs with the undisciplined use of macros. In the interests of avoiding such problems, the following restrictions are mandated.

    Macro routines should have all elements passed explicitly and should have parentheses around their usage in the definition. The use of local and global variables within macros is discouraged. Macros of the form:

    	#define CALC(i, j)	i + j * k - l
    
    are in express violation of this standard. The appropriate form should be:
    	#define CALC(i, j, k, l)	((i) + (j) * (k) - (l))
    

    If a macro consists of multiple statements, they should be enclosed in curly brackets ("{" and "}") and should not be ended with a semicolon (";").

    In the interests of avoiding potential side effects, it is recommended that macros be written in such a way as to evaluate their parameters only once.

    Declaration Standard

    This section describes the allowable forms of declarations. Unless mentioned in this section, other forms of declarations should be avoided. (Function declarations are described in a separate section.)

    For emumerations, the proper forms are:

    typedef enum { first, second, third } type_name;
    		/*
    		 * This form is acceptable if it fits easily on a single line
    		 * and the elements are self-explanatory.
    		 */
    
    
    typedef enum {
    	first,	/* Pertinent comment.  (Not required.) */
    	second,	/* Pertinent comment.  (Not required.) */
    	third	/* Pertinent comment.  (Not required.) */
    } type_name;
    		/*
    		 * This form should be used if the definition will not fit on
    		 * a single line or if individual elements require explanation.
    		 */
    
    
    typedef enum {
    	first,	second,	third,	fourth,	fifth,
    	sixth,	seventh,	eighth,	ninth,	tenth,
    	eleventh,	twelfth,	thirteenth
    } type_name;
    		/*
    		 * This form should be used for large numbers of
    		 * self-descriptive elements.
    		 */
    
    
    typedef enum {
    	first  = initializer1,
    	second = initializer2,
    	third  = initializer3
    } type_name;
    
    
    

    For structures, the proper forms are:
    typedef struct {
    	type_name1	field_name1;  /* Purpose/usage */
    	type_name2	field_name2;
    		/*
    		 * Particularly long and detailed explanation of the purpose/usage
    		 * of this field using remarkably long words and referencing dull,
    		 * dry tomes better left buried in the crypt from which they were
    		 * unearthed rather than be forced out into the light of day.
    		 */
    	type_name3	field_name3;  /* Purpose/usage */
    } type_name;
    
    
    typedef struct {
    	unsigned	field_name1	: 16;  /* Purpose/usage */
    	unsigned	field_name2	: 8;   /* Purpose/usage */
    	unsigned				: 2;   /* Why unused? */
    	unsigned	field_name3	: 4;
    		/* Particularly long comment regarding purpose/usage */
    	unsigned	field_name4	: 2;   /* Purpose/usage */
    } type_name;
    
    
    

    And, of course, for simple declarations:
    type_name	id;		/* Purpose/usage */
    
    type_name	id_1,
    		id_2;
    
    type_name	id_1 = init,
    		id_2 = init;
    
    In the case of pointer declarations, the asterisk should be associated with the variable name, not the pointer type. To illustrate, the following is wrong:
    				int*	index_ptr;		/* WRONG */
    
    Rather, the proper form is:
    				int	*index_ptr;		/* RIGHT */
    
    Whether the asterisk is lined-up at the standard "tab" indentation level or unindented by one space is a matter of programmer choice. To illustrate, both of the following are acceptable:
    	bool	is_ready;
    	int  *next_widget;	/* unindented */
    	char	id;
    

    and:
    	bool	is_ready;
    	int	*next_widget;	/* lined-up */
    	char	id;
    
    At no time should the fact that the compiler assigns enumeration values in a particular manner be used in a program. Rather than do this, explicit values should be associated with the elements in the declaration.

    The use of bit fields to minimize storage usage (as opposed to mapping hardware structures) is strongly discouraged.

    Unless truly obvious, comments should be included with each element of a structure.

    In variable declarations, there should be only one identifier on a line. Multiple identifiers and/or multiple identifier assignments on a line are not acceptable unless they are intimately related, and even then they are not encouraged.

    Numerical constants should not be coded directly. Instead the "#define" facility should be used. Constants declared explicitly "long" should use a capital "L". It is too easy to confuse letters and digits if this rule is not followed (i.e. 2l [2-el] looks too much like 21 [twenty-one]).

    For external arrays, repeat the array bounds declarations. Since (given the preceding paragraph) any fixed limit should be "#define"-ed, there should be no problem with maintainability.

    Never default "int" declarations, whether functions or parameters.

    The generous use of the keyword "static" on global functions and variables is encouraged to restrict their visibility outside the file. Global accessiblity of variables is discouraged without good reasons. Conversely, the use of local "extern" declarations within functions is actively discouraged without strong justification.

    In general, it is a poor idea to employ local declarations that override declarations at higher levels.

    Particularly in the case of structs, types and instances of types should not combined in the same declaration. To illustrate, the following is not acceptable.

    	struct windmill {
    		int	num_sails;
    		int	usage;
    		int	style;
    	} don_quixote;				/* WRONG */
    
    
    

    Rather, it should be:
    	struct windmill {
    		int	num_sails;
    		int	usage;
    		int	style;
    	};
    
    	struct windmill	don_quixote;	/* RIGHT */
    

    Expressions

    It has been said that there is little one can do about the problems caused by side effects in parameters except to avoid side effects in expressions. These are commendable words and should be adhered to rigorously. Remember that the "++" and "--" operators are also assignment operators and thus do produce side effects.

    Conditional expressions (a ? b : c), are not intuitive, can be confusing (particularly nested conditional expressions), and should be avoided. Where appropriate, the approved form is:

    	(condition ? true_return_val : false_return_val)
    
    where the parentheses and the spaces around the "?" and ":" are mandatory. In addition, if any portion of the expression is other than a simple expression, parentheses around the offending section are encouraged.

    Expressions that span multiple lines should be split before an operator, preferably at the lowest-precedence operator near the break.

    When using negation (!) in conditional expressions, it is recommended that the expression to be operated upon be enclosed in parentheses to improve readability and remove any ambiguities that might arise.

    The use of left-shift and right-shift operators should be reserved for bit operations. Their use for multiplication, division, and exponentiation is strongly discouraged. (Besides, most intelligent compilers will recognize the arithmetic cases and produce shift code for them, anyway.)

    Assignment Statements and Initializations

    There is a time and place for embedded assignment statements, but rarely. In general they should be avoided. The primary acceptable instance is in conditional statements to check for special conditions. The two best examples are:
    	if ((obj_ptr = malloc(elem_num, elem_size) == NULL) {
    		
    	}
    

    and:
    	while ((c = getchar()) != EOF) {
    		
    	}
    
    Remember, an embedded assignment statement is a form of side effect (and that "x++" and "x--" are also assignment statements.)

    Unless a local variable is going to be used very shortly after it is declared, it is recommended its initialization be performed at its point of first use rather than where it is declared. Global variables should be initialized where declared. If this is not convenient (e.g. large arrays), they should be initialized in a dedicated initialization routine. In the case of dynamic initialization of structure variables, initialize the fields in the order in which they are defined. To illustrate:

    	typedef struct {
    		int	maker;
    		int	model;
    		int	year;
    		int	color;
    	} car;
    
    
    	car	my_car;
    
    	my_car.maker = PORSCHE;
    	my_car.model = most_expensive;
    	my_car.year  = this_year;
    	my_car.color = RED;
    
    Since we live in an imperfect world, do not assume that uninitialized variables will be set to zero by the compiler. While this might be the case, resist the temptation to succumb to this assumption. If the initial value of a variable makes a difference, initialize it explicitly.

    Along these lines, remember that that memory allocated by malloc() will not be zeroed. If it is important to have dynamically allocated memory zeroed (usually a good idea), calloc() should be used. (With respect to dynamic memory allocations, the reader is referred to the "safer" versions of these routines discussed in the P6 System Header File appendix.)

    Simple Statements

    For the purpose of the ensuing discussions, we wish to define what we mean by a "simple" statement. A simple statement is one of three possibilities:
    
    
    It is either a simple assignment:
    		a = x[i];
    

    or
    		a = f(x);
    

    a simple increment:
    		i++;
    

    or
    		m = m + n;
    

    or a function call:
    		f(a, b, c);
    
    

    It is doubtful that:
    		*z[t] = f(x[f1(i)], f2(y[j] + n), k * r(s));
    

    could be considered a simple statement.
    
    
    

    Conditional Statements

    The form of conditional statements is as follows:
    	if (condition)
    		simple_then_statement;
    

    or (preferable)
    	if (condition) {
    		then_statements;
    	}
    

    With an else part: if (condition) simple_then_statement; else simple_else_statement;
    or
    	if (condition) {
    		then_statement(s);
    	} else {
    		else_statement(s);
    	}
    
    
    

    For complex conditions:
    	if (    condition_1
    	    && (condition_2 | | condition_3)
    	    &&  condition_4) {
    		then_statements;
    	} else {
    		else_statements;
    	}
    
    
    

    For nested if's, the proper form is:
    	if (condition1) {
    		statements;
    	} else if (condition2) {
    		statements;
    	} else if (condition3) {
    		statements;
    	} else {
    		statements;
    	}
    
    
    

    For nested control structures (including nested "if" statements), compound statements are required. To illustrate, the following is not allowed:
    	if (condition1)
    		while (condition2) {
    			statements;
    		}
    	else
    		else_statement;		/* WRONG */
    
    
    

    Rather, the approved form is:
    	if (condition1) {
    		while (condition2) {
    			statements;
    		}
    	} else {
    		else_statement;		/* RIGHT */
    	}
    
    
    

    The use of compound statements is recommended to avoid ambiguity. The following is not acceptable:
    	if (condition1)
    		if (condition2)
    			simple_then_statement;	/* WRONG */
    	else
    		simple_else_statement;
    
    
    

    It is better to use either
    	if (condition1) {
    		if (condition2)
    			simple_then_statement;
    	} else {
    		simple_else_statement;
    	}
    

    or if (condition1) { if (condition2) { simple_then_statement; } else { simple_else_statement; } } (Depending upon what was intended.)

    The only time brackets are not required on all parts of a conditional statement is when all parts of the conditional statement are simple statements (as defined above in the Simple Statements section). In other words, if any part of a conditional statement is a compound statement (for whatever reason), then all parts must be compound.

    In general, the use of compound statements {} is encouraged as an aid to readability and maintainability.

    Iterative Statements

    Iterative statements should be of the form:
    	while (condition) {
    		statements;
    	}
    

    or
    	do {
    		statements;
    	} while (condition);
    

    or
    	for (i = initial; condition; next) {
    		statements;
    	}
    
    
    

    For infinite loops, the recommended form is:
    	while (TRUE) {		/* (If TRUE has been defined nonzero.) */
    		statements;
    	}
    

    or
    	while (1) {
    		statements;
    	}
    
    If there is only a single, simple statement to be executed, the brackets {} are not required but are encouraged. In any event, the statement to be executed must be on a line of its own.

    If an iterative statement has a null (empty) body, it should use an empty compound statement containing a comment verifying its emptiness.

    	/* Find where strings differ. */
    	while (*str1++ == *str2++) {
    		/* VOID */
    	}
    
    The use of the "continue" statement is not encouraged. When used, it should be commented explicitly and, if possible, used early in the loop body. In addition, appropriate comments should added to make it easy to determine its target.

    Compound (Bracketed) Statements

    As mentioned earlier, there is no requirement to use brackets {} in iterative statements if there is only a single, "simple" statement to be executed. The same was said to be true for conditional statements, with the added proviso that if any statement in the conditional statement was compound, then all statements were required to be compound.

    In these cases, although the brackets are not required they are strongly recommended as an aid to maintainability. To illustrate this, consider the following calculation of Ackermann function values:

    	x[0] = 1;
    	x[1] = 1;
    	for (i = 2 ; i <= LIMIT ; i++)
    		x[i] = ackermann(x, i);
    
    Should we later decide to sum the values as we go along, we might unwittingly add:
    	x[0] = 1;
    	x[1] = 1;
    	sum = 2;				/* New code. */
    	for (i = 2 ; i <= LIMIT ; i++)
    		x[i] = ackermann(x, i);
    		sum = sum + x[i];		/* New code. */
    
    Here, although the indentation might make it look right, sum is only calculated after the for loop and would end up with the value, 2 + ackermann(LIMIT). While we all know better than to do something stupid like this, its occurrence (by others, of course) is all too frequent. Thus the recommended form of the initial construct is:
    	x[0] = 1;
    	x[1] = 1;
    	for (i = 2 ; i <= LIMIT ; i++) {
    		x[i] = ackermann(x, i);
    	}
    
    This removes any possibility of ambiguity and reduces the chance of error with later enhancements/modifications.

    Switch Statements

    Switch statements should have the following form:
    	switch (selector) {
    	    case first:
    	    case the_second:
    	        statements;
    	        break;
    
    	    case third:
    	        statements;
    	        break;
    
    	    case dont_care_1:
    	    case dont_care_2:
    	        break;
    
    	    default:
    	        fatal("Unexpected selector in 'procedure_name'");
    	}
    
    Switch statements are the only departure from the "standard" indentation. As will be mentioned in the Indentation and Spacing section, the standard indentation is 8 spaces (one 8-space tab stop). In switch statements, the 'case's are indented 4 spaces from the 'switch' and the statements are indented 8 spaces (one tab) from the 'switch'.

    The last case of the statement should be followed by an explicit break, even if it is the last choice in the statement. This prevents potential oversight problems when the switch statement is added to at a later time. If the last choice in the switch statement is default, it does not require a break.

    In the case of enumerated types, each element of the enumeration must have a "case" in the switch statement. In addition, a "default" must exist as the last choice and must contain an indication that an error has occurred.

    If the statements in a particular 'case' do not end with a 'break' (thereby continuing control in the following 'case'), a bold comment should exist to indicate and explain the situation. In addition, it is recommended that a 'lint' style comment of the form /*FALLTHROUGH*/ be placed where a break might otherwise be. To illustrate:

    	/* Print numeric value. */
    	switch (num->type) {
    	    case signed_int:
    	        putchar(num->negative ? "-" : "+");	/* Place the sign. */
    	        /*FALLTHROUGH*/
    
    	    case unsigned_int:
    	        printf("%d", num->int_value);		/* Now print the value. */
    	        break;
    
    	    case floating_pt:
    	        printf("%lf", num->fp_value);
    	        break;
    
    	    default:
    	        warning("Unknown num->type encountered.");
    	}
    

    Function Standard

    The proper form of a function declaration is as follows:
    /*
     * function_name
     *
     *FUNCTION:
     * Interface specification.  Purpose of routine.  Expected usage.
     * Pertinent comments regarding return values.
     *
     *PARAMS:
     * Discussion of parameters.  Assumptions made about parameters, if any.
     * This section is necessary only if there is something more meaningful to be said
     * about the parameters that is not contained in their comments.
     *
     *LOGIC:
     * Internal operation and structure.  Algorithm description.
     *
     *ASSUMPTIONS:
     * Assumptions made that affect the correct functioning of the routine.
     *
     *NOTE:
     * Any special caveats, concerns, or special cautions.
     *
     *RETURNS:
     * Information regarding the possible return values.
     */
    
    return_type
    function_name(param1, param2)
    param_type param1;		/* Purpose.  Expected values. */
    param_type param2;		/* OUT:  (If modified.)  Purpose.  Expected values. */
    {
    	type1 variable1,	/* Purpose of variable.  Description of use. */
    	        variable_the_second_of_this_type;
    				/*
    				 * Purpose and description of this second variable.
    				 * As much detail as necessary to make sense to others.
    				 */
    	type2 variable_3;	/* Comment as above. */
    
    	CODE BODY;
    
    }/*** end function_name() ***/
    
    While portions of the function's comment header may be omitted if they have no meaningful content, minimum necessities are the function_name and the FUNCTION: sections.

    Each function parameter must be declared on a separate line. Declarations of multiple parameters of the same type on the same line is expressly forbidden, no matter how intimately related the parameters may be.

    Although C assumes that a function without a specified type returns an int, this construct should never be used. All functions should have either an explicit return type or void (for "no return value"). If a function is specified as void, it should never be used as an expression. If a function is specified with an explicit return type, it should never be used as a statement. If the returned value is of no interest, it is recommended that it be cast in the form:

    	(void) f(x);
    

    or
    	dont_care = f(x);
    

    Indentation and Spacing

    With the specific exceptions mentioned earlier in the section on Switch Statements, the standard unit of indentation is an 8-space tab (or 8 spaces). Use of tabs is encouraged, but tab stops must be, without exception, 8 spaces. (Due to the idiosyncrasies of text formatters, tabs [or 8 spaces] may not be translated to paper accurately for the examples in this document. Assume indentation levels of 8 spaces if that appears to be the intent.)

    Every reasonable effort should be made to limit line lengths to 80 characters. This improves the readability when looking at listings or when viewing on standard (limited) alpha-numeric terminals. While program understandability should not be compromised to meet this goal, code which consistently breaks the 80 column barrier may need to be justified before a higher court.

    One of the primary purposes of spaces in a program is to enhance readability. To this end, the use of horizontal and vertical spacing is encouraged. As an aid to the uncertain reader, the following recommendations are provided: