Linux Coding Style

The Linux Kernel, like any large software project, has a defined coding style that stipulates the formatting, style, and layout of your code. This is done not because the Linux kernel style is superior (although it might be) or because your style is illegible (although it might well be), but because consistency of coding style is crucial to productivity in coding. Yet, it is often argued that coding style is irrelevant because it does not affect the compiled object code. In a large project, such as the kernel, in which many developers are involved, consistency of coding style is crucial. Consistency implies familiarity, which leads to ease of reading, lack of confusion, and further expectations that code will continue to follow a given style. This increases the number of developers who can read your code, and the amount of code you can read. In an open-source project, the more eyes the better.

It is not so important what style is chosen as long as one is indeed selected and used exclusively. Fortunately, Linus long ago laid out the style we should use and most code sticks to it. The majority of the style is covered in Linus's usual humor in the file Documentation/CodingStyle in the kernel source tree.

Indention

The kernel style for indention is to use tabs that are eight characters in length. This does not mean it is okay to use eight spaces for indention or four spaces or anything else. It means each level of indention is a tab from the previous, and a tab is eight characters. For an unknown reason, this rule is one of the most commonly broken, despite its very high impact on readability. Eight-character tabs make clearly identifying indention of different code blocks magnitudes easier after hours of hacking.

Braces

Brace placement is personal, and few technical reasons exist for one convention over the other, but we have to agree on something. The accepted kernel style is to put the opening brace on the first line, at the end of the statement. The closing brace goes on a new line as the first character. Example:

if (fox) {
        dog();
        cat();
}

Note that the closing brace is not on a line by itself when the following token is a continuation of the same statement. For example:

if (fox) {
        ant();
        pig();
} else {
        dog();
        cat();
}

And,

do {
        dog();
        cat();
} while(fox);

This rule is broken for functions, because functions cannot nest inside functions:

unsigned long func(void)
{
  /* ... */
}

Finally, statements that do not need braces can omit them. For example, the following is acceptable:

if (foo)
       bar();

The logic behind all this is K&R.^[1]

^[1] The C Programming Language, by Brian Kernighan and Dennis Ritchie (Prentice Hall, ISBN# 0-13-11-362-8), nicknamed K&R, is the bible of C, written by C's author and his colleague.

Line Size

Code in the kernel, as much as possible, should be kept to fewer than 80 character lines in length. This allows code to fit length-wise on a standard 80x24 terminal.

There is no accepted standard on what to do in cases where code absolutely must wrap 80 lines. Some developers just allow the line to wrap, letting their editor handle the chore of displaying the code in a readable fashion. Other developers break up the lines, manually inserting line breaks where appropriate, perhaps starting each new line two tab stops over from the original.

Similarly, some developers line up function parameters that wrap lines with the open parenthesis. For example

static void get_pirate_parrot(const char *name,
                              unsigned long disposition,
                              unsigned long feather_quality)

Other developers break up the lines but do not line the parameters up, instead using a standard two tabs, for example

int find_pirate_flag_by_color(const char *color,
                const char *name, int len)

As there is no definitive rule in this case, the choice is left up to you, the developer.

Naming

No name should have mixed case. Calling a local variable idx or even just i is perfectly fine if it is clear what it does. A cute name such as theLoopIndex is unacceptable. Hungarian notation (encoding the variable type in the variable name) is evil and should never ever be usedthis is C, not Java; Unix, not Windows.

Nonetheless, global variables and functions should have very descriptive names. Calling a global function atty() is confusing; something like get_active_tty() is much more acceptable. This is Linux, not BSD.

Functions

As a rule of thumb, functions should not exceed one or two screens of text and should have fewer than ten local variables. A function should do one thing and do it well. There is no harm in breaking a function into a series of smaller functions. If you are worried about function call overhead, use inline.

Comments

Commenting your code is very important, but the commenting must be done correctly. Generally, you want to describe what and why your code is doing what it is doing, not how it is doing it. The how should be apparent from the code itself. If not, maybe you need to rethink what you wrote. Additionally, comments should not include who wrote a function, the modification date, or other trivial nonsense. Such information is generally acceptable at the top of the source file, however.

The kernel uses C-style comments, even though gcc supports C++-style comments, too. The general style of a comment in the kernel resembles:

/*
 * get_ship_speed() - return the current speed of the pirate ship
 * We need this to calculate the ship coordinates. As this function can sleep,
 * do not call while holding a lock.
 */

In comments, important notes are often prefixed with "XXX:", and bugs are often prefixed with "FIXME:" like so:

/*
 * FIXME: We assume dog == cat which may not be true in the future
 */

The kernel has a facility for self-generating documentation. It is based on GNOME-doc, but slightly modified and renamed Kernel-doc. To create the standalone documentation in HTML format, run:

make htmldocs

Or, for postscript,

make psdocs

You can use the system to document your functions by following a special format for your comments:

/**
 * find_treasure  find 'X marks the spot'
 * @map  treasue map
 * @time  time the treasure was hidden
 *
 * Must call while holding the pirate_ship_lock.
 */
void find_treasure(int map, struct timeval *time)
{
  /* ... */
}

For more information, see Documentation/kernel-doc-nano-HOWTO.txt.

`Typedefs`

For various reasons, the kernel developers have a certain hatred for typedef that almost defies explanation. Their rationale is

typedef hides the real type of data structures.
Because the type is hidden, code is more prone to do bad things, such as pass a structure by value on the stack.
typedef is just being lazy.

Therefore, to avoid ridicule, avoid typedef.

Of course, there are a few good uses of typedefs: hiding an architecture-specific implementation of a variable or providing forward compatibility when a type may change. Decide carefully whether the typedef is truly needed or exists just to reduce the number of characters you need to type.

Using What Is Already Provided

Do not reinvent the wheel. The kernel provides string manipulation functions, compression routines, and a linked list interface, so use them.

Do not wrap existing interfaces in generic interfaces. Often you see code that was obviously ported from one operating system to Linux, and various kernel interfaces are wrapped in some gross glue function. No one likes this, so just use the provided interfaces directly.

No `ifdefs` in the Source

Putting ifdef preprocessor directives directly in the C source is frowned upon. You should never do something like the following in your functions:

    ...
#ifdef CONFIG_FOO
    foo();
#endif
    ...

Instead, define foo() to nothing if CONFIG_FOO is not set:

#ifdef CONFIG_FOO
static int foo(void)
{
    /* .. */
}
#else
static inline int foo(void) { }
#endif

Then, you can unconditionally call foo(). Let the compiler do the work for you.

Structure Initializers

Labeled identifiers need to be used to initialize structures. This is good because it prevents structure changes from resulting in incorrect initialization. It also enables values to be omitted. Unfortunately, C99 adopted quite an ugly format for labeled identifiers, and gcc is deprecating usage of the previous GNU-style labeled identifier, which was rather handsome. Consequently, kernel code needs to use the new C99 labeled identifier format, however ugly it is:

struct foo my_foo = {
.a    = INITIAL_A,
.b    = INITIAL_B,
};

In this code, a and b are members of struct foo and INITIAL_A and INITIAL_B are their initialized values, respectively. If a field is not set, it is set to its default value per ANSI C (e.g., pointers are NULL, integers are zero, and floats are 0.0). For example, if struct foo also has int c as a member, the previous statement would initialize c to zero.

Yes, it is ugly. No, we do not have a choice.

Fixing Code Up Ex Post Facto

If a pile of code falls into your lap that fails to even mildly resemble the Linux kernel coding style, do not fret. A little elbow grease and the indent(1) utility will make everything perfect. The indent program, an excellent GNU utility found on most Linux systems, formats source according to given rules. The default settings are for the GNU coding style, which is not too pretty. To get the utility to follow the Linux kernel style, do

indent -kr -i8 -ts8 -sob -l80 -ss -bs -psl <file>

This instructs the utility to format the code according to the kernel coding style. Alternatively, the script scripts/Lindent automatically invokes indent with the desired options.