What are segmentation faults (segfaults), and how can I identify what's causing them?
A segmentation fault (aka segfault) is a common condition that
causes programs to crash; they are often associated with a file named
core . Segfaults are caused by a program trying to
read or write an illegal memory location. Program memory is divided
into different segments: a text segment for program instructions, a
data segment for variables and arrays defined at compile time, a stack
segment for temporary (or automatic) variables defined in subroutines
and functions, and a heap segment for variables allocated during
runtime by functions, such as malloc (in C) and
allocate (in Fortran). For more, see What are program segments, and which segments are different types of variables stored in? A
segfault occurs when a reference to a variable falls outside the
segment where that variable resides, or when a write is attempted to a
location that is in a read-only segment. In practice, segfaults are
almost always due to trying to read or write a non-existent array
element, not properly defining a pointer before using it, or (in C
programs) accidentally using a variable's value as an address (see the
scanf example below):
- For example, calling
memset()as shown below would cause a program to segfault: memset((char *)0x0, 1, 100); - The following three cases illustrate the most common types of
array-related segfaults:
Case A /* "Array out of bounds" error valid indices for array foo are 0, 1, ... 999 */ int foo[1000]; for (int i = 0; i <= 1000 ; i++) foo[i] = i; Case B /* Illegal memory access if value of n is not in the range 0, 1, ... 999 */ int n; int foo[1000]; for (int i = 0; i < n ; i++) foo[i] = i; Case C /* Illegal memory access because no memory is allocated for foo2 */ float *foo, *foo2; foo = (float*)malloc(1000); foo2[0] = 1.0;
- In case A, array
foois defined forindex = 0, 1, 2, ... 999. However, in the last iteration of theforloop, the program tries to accessfoo[1000]. This will result in a segfault if that memory location lies outside the memory segment wherefooresides. Even if it doesn't cause a segfault, it is still a bug.
- In case B, integer
ncould be any random value. As in case A, if it is not in the range0, 1, ... 999, it might cause a segfault. Whether it does or not, it is certainly a bug.
- In case C, allocation of memory for variable
foo2has been overlooked, sofoo2will point to a random location in memory. Accessingfoo2[0]will likely result in a segfault.
- In case A, array
- Another common programming error that leads to segfaults is
oversight in the use of pointers. For example, the C function
scanf()expects the address of a variable as its second parameter; therefore, the following will likely cause the program to crash with a segfault: int foo = 0; scanf("%d", foo); /* Note missing & sign ; correct usage would have been &foo */The variable
foomight be defined at memory location1000, but the above function call would try to read integer data into memory location0according to the definition offoo. - A segfault will occur when a program attempts to operate on a
memory location in a way that is not allowed (e.g., attempts to write
a read-only location would result in a segfault).
- Segfaults can also occur when your program runs out of stack space. This may not be a bug in your program, but may be due instead to your shell setting the stack size limit too small.
Finding out-of-bounds array references
Most Fortran compilers have an option that will insert code to do
bounds checking on all array references during runtime. If an access
falls outside the index range defined for an array, the program will
halt and tell you where this occurs. For most Fortran compilers, the
option is -C, or -check followed by a
keyword. See your compiler's user guide to get the exact option. Use
bounds checking only when debugging, since it will slow down your
program. Some C compilers also have a bounds checking option.
Checking shell limits
As noted in the last example above, some
segfault problems are not due to bugs in your program, but are caused
instead by system memory limits being set too low. Usually it is the
limit on stack size that causes this kind of problem. To check memory
limits, use the ulimit command in bash or
ksh, or the limit command in
csh or tcsh. Try setting the stacksize
higher, and then re-run your program to see if the segfault goes
away.
Spotting the cause of a segfault using debuggers
If you can't find the problem any other way, you might try a
debugger. For example, you could use GNU's well-known debugger
GDB to view the backtrace of a core file
dumped by your program; whenever programs segfault, they usually dump
the content of (their section of the) memory at the time of the crash
into a core file. Start your debugger with the command
gdb core, and then use the backtrace command
to see where the program was when it crashed. This simple trick will
allow you to focus on that part of the code.
If using backtrace on the coreg file
doesn't find the problem, you might have to run the program under
debugger control, and then step through the code one function, or one
source code line, at a time. To do this, you will need to compile your
code without optimization, and with the -g flag, so
information about source code lines will be embedded in the executable
file. For more, see Within Emacs on Unix, how can I debug a C or C++ program? and Step-by-step example for using GDB within Emacs to debug a C or C++ program.
This document was developed with support from the National Science Foundation (NSF) under Grant No. 0503697 to the University of Chicago and subcontracted to Indiana University. Additional support was provided by IU through its participation in the TeraGrid, which is supported by the NSF under Grants No. 0833618, SCI451237, SCI535258, and SCI504075. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the NSF.
This document was developed with support from National Science Foundation (NSF) grant OCI-1053575. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the NSF.
Last modified on September 07, 2011.







