Sunday 12 January 2014

Segmentation Fault - An analysis

      A segmentation fault occurs when a program attempts to access a memory location that it is not allowed to access, or attempts to access a memory location in a way that is not allowed (for example, attempting to write to a read-only location, or to overwrite part of the operating system). On the operating system level, the process of segmentation divides available memory into segments. When encountering an error in writing to a memory segment, the Unix or Linux operating system sends a SIGSEGV signal to the program, which then crashes with the "segmentation fault" message. A core file may be generated to aid debugging. Segmentation faults can result in the following cases :

     1. When a running program attempts to access a memory location not allocated to it. e.g.
struct Ex
{
    int n;
};
int main()
{
    struct Ex *obj;
    obj->n=5;
}
      Here the pointer obj to struct Ex does not contain a valid object or address, thus it will result in a crash

     2. Accessing a location of memory the program is not permitted to access. An example of this is an attempt to access read only memory.  e.g.
 char *s = "hello world";
 *s = 'H';
      When the above code is compiled, the string "hello world" is placed in the read only section of the program binary. When the program gets executed the program the pointer is is set to point to the strings' location. So far so good. The next line however attempts to write the character 'H' to the memory. Since this memory was read only and we attempted a write, the program crashes with a segmentation fault.

      3. Dereferencing a NULL pointer. e.g.
 int *s = NULL;
 *s = 1;
      In this case an integer pointer is set to NULL. This means that it does not point to anything yet. So when we dereference the pointer and try to set a value 1 to a non-existent memory location we encounter a segmentation fault.

      4. Stack Overflow : On some systems a stack overflow will result in a segmentation fault. An example of this is recursive function without a base case.(which recursion never stops). Fo e.g. the following code causes a segmentation fault because of a stack overflow.
 void recurse()
 {
     recurse();
 }
 int main()
 {
     recurse();
 }

Debugging a segmentation fault


      Looking for a segmentation fault without any tool is a difficult task without the proper tools. One tool which you can use to debug a segmentation fault is the GNU Debugger(gdb). But to use gdb effectively we need to compile our C program with the -g option. The -g option produces debugging information in the operating system's native format.  GDB can further work with this debugging information.

Let us take the example of the following program :
#include<stdio.h>
#include<stdlib.h>
// the program contains an intentional error
// on compilation it may issue a warning which we can ignore for demo purpose
int input_num()
{
        int n;
        int *p=(int *)malloc(sizeof(int));
        scanf("%d",n);
        return n;
}
void input_array(int *a,int n)
{
        int i;
        a = malloc(n*sizeof(int));
        for(i=0;i<n;i++)
                a[i]=input_num();
}
int main()
{
        int *a,n,i;
        n=10;
        input_array(a,10);
        for(i=0;i<n;i++)
                printf("%d",a[i]);
}

      The program is meant to input an array of length 10. The function input_num is used to input a number, and this functions is used by the input_array function to input the array. However the program contains an intentional error which will cause a segmentation fault. The program compiles just fine(maybe with a warning about the error itself, but we can ignore it for demonstration purposes).

Getting your hands dirty : The Debugging!


To begin, compile the program with the -g option.
gcc sigsegv.c -o sigsegv -g -w
The -w option is there to ignore any warnings.

Now use gdb to start the debugger for the output file sigsev(the compiled output file for our program)
rishabh@ubuntu:~/SIGSEGV$ gdb sigsegv 
GNU gdb (Ubuntu/Linaro 7.4-2012.04-0ubuntu2.1) 7.4-2012.04
Copyright (C) 2012 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "i686-linux-gnu".
For bug reporting instructions, please see:
<http://bugs.launchpad.net/gdb-linaro/>...
Reading symbols from /home/rishabh/SIGSEGV/sigsegv...done.
(gdb)
Next run the program with the run command :
(gdb) run
Starting program: /home/rishabh/SIGSEGV/sigsegv
Input a number : 1

Program received signal SIGSEGV, Segmentation fault.
0xb7e71d73 in _IO_vfscanf () from /lib/i386-linux-gnu/libc.so.6
(gdb) 

Now as you can see we ran into a segmentation fault. Take a backtrace of the program now with the bt command.

(gdb) bt
#0  0xb7e71d73 in _IO_vfscanf () from /lib/i386-linux-gnu/libc.so.6
#1  0xb7e786bb in __isoc99_scanf () from /lib/i386-linux-gnu/libc.so.6
#2  0x0804848b in input_num () at sigsegv.c:7
#3  0x080484c1 in input_array (a=0xb7fc6ff4, n=10) at sigsegv.c:15
#4  0x080484fa in main () at sigsegv.c:21
(gdb)

A backtrace shows the function calls that were made by our program. So we can see that we crashed in the function called _IO_vfscanf (). But we are insterested only in what part of our program triggered this function. Going thrugh the bactrace we can see that the last user defined function called from by our program was input_num (). The "at sigsegv.c:7" just after the function call indicates that this function was executed till before this line, and on this line a function call was made.
Now lets look at this stack frame. Its the frame #2. To view it give the frame command :

(gdb) frame 2
#2  0x0804848b in input_num () at sigsegv.c:7
7        scanf("%d",n);
(gdb)

Now we can see that scanf is the culprit function. We can generally assume that the library functions work correctly. So this means that the fault is with the arguments we passed to the function. Lets print the arguments that were passed to the function with the print command.

(gdb) print n
$1 = -1208193036
(gdb)

Now the scanf function expects an address to an integer variable be passed to it as an argument, but here we see that instead of an address we are passing an integer with a garbage value. This concludes our analysis. We know that that passing this argument in the wrong way caused our segmentation fault. Quit the debugger with the quit command

(gdb) quit

A debugging session is active.

    Inferior 1 [process 4583] will be killed.

Quit anyway? (y or n) y
rishabh@ubuntu:~/SIGSEGV$ 

Another case of a segmentation fault is with the non ending recursive calls. In this case a simple look at the backtrace is enough to find the fault in the program. The following is such a program.

#include<stdio.h>
void recurse()
{
    recurse();
}
int main()
{
    recurse();
}

The following is its debugger output :

rishabh@ubuntu:~/SIGSEGV$ gdb recurse 
GNU gdb (Ubuntu/Linaro 7.4-2012.04-0ubuntu2.1) 7.4-2012.04
Copyright (C) 2012 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "i686-linux-gnu".
For bug reporting instructions, please see:
<http://bugs.launchpad.net/gdb-linaro/>...
Reading symbols from /home/rishabh/SIGSEGV/recurse...done.
(gdb) run
Starting program: /home/rishabh/SIGSEGV/recurse 

Program received signal SIGSEGV, Segmentation fault.
recurse () at recurse.c:4
4        recurse();
(gdb) bt
#0  recurse () at recurse.c:4
#1  0x080483bf in recurse () at recurse.c:4
#2  0x080483bf in recurse () at recurse.c:4
#3  0x080483bf in recurse () at recurse.c:4
#4  0x080483bf in recurse () at recurse.c:4
#5  0x080483bf in recurse () at recurse.c:4
#6  0x080483bf in recurse () at recurse.c:4
#7  0x080483bf in recurse () at recurse.c:4
#8  0x080483bf in recurse () at recurse.c:4
#9  0x080483bf in recurse () at recurse.c:4
#10 0x080483bf in recurse () at recurse.c:4
#11 0x080483bf in recurse () at recurse.c:4
#12 0x080483bf in recurse () at recurse.c:4
#13 0x080483bf in recurse () at recurse.c:4
#14 0x080483bf in recurse () at recurse.c:4
#15 0x080483bf in recurse () at recurse.c:4
#16 0x080483bf in recurse () at recurse.c:4
#17 0x080483bf in recurse () at recurse.c:4
#18 0x080483bf in recurse () at recurse.c:4
#19 0x080483bf in recurse () at recurse.c:4
#20 0x080483bf in recurse () at recurse.c:4
#21 0x080483bf in recurse () at recurse.c:4
#22 0x080483bf in recurse () at recurse.c:4
#23 0x080483bf in recurse () at recurse.c:4
#24 0x080483bf in recurse () at recurse.c:4
#25 0x080483bf in recurse () at recurse.c:4
#26 0x080483bf in recurse () at recurse.c:4
#27 0x080483bf in recurse () at recurse.c:4
#28 0x080483bf in recurse () at recurse.c:4
#29 0x080483bf in recurse () at recurse.c:4
---Type <return> to continue, or q <return> to quit---q
Quit
(gdb) q
A debugging session is active.

    Inferior 1 [process 4710] will be killed.

Quit anyway? (y or n) y
rishabh@ubuntu:~/SIGSEGV$ 

Debugging a CORE file :


      Debugging a generated core file is also possible in a similar way. The steps are as follows :

1. Compile the program with the -g option.
2. Give the "ulimit -c unlimited" command to enable Linux to core-dump on a segmentation fault.
3. Run the program normally.
4. Debug the program with the gdb command "gdb <executable-name> <core-file>" The following is the debugging of the array program through a core file :

rishabh@ubuntu:~/SIGSEGV$ gcc sigsegv.c -o sigsegv -g -w
rishabh@ubuntu:~/SIGSEGV$ ulimit -c unlimited
rishabh@ubuntu:~/SIGSEGV$ ./sigsegv 
Input a number : 1
Segmentation fault (core dumped)
rishabh@ubuntu:~/SIGSEGV$ ls
core  recurse  recurse.c  sigsegv  sigsegv.c
rishabh@ubuntu:~/SIGSEGV$ gdb sigsegv core
GNU gdb (Ubuntu/Linaro 7.4-2012.04-0ubuntu2.1) 7.4-2012.04
Copyright (C) 2012 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "i686-linux-gnu".
For bug reporting instructions, please see:
<http://bugs.launchpad.net/gdb-linaro/>...
Reading symbols from /home/rishabh/SIGSEGV/sigsegv...done.

warning: exec file is newer than core file.
[New LWP 5059]

warning: Can't read pathname for load map: Input/output error.
Core was generated by `./sigsegv'.
Program terminated with signal 11, Segmentation fault.
#0  0xb7667d73 in _IO_vfscanf () from /lib/i386-linux-gnu/libc.so.6
(gdb) bt
#0  0xb7667d73 in _IO_vfscanf () from /lib/i386-linux-gnu/libc.so.6
#1  0xb766e6bb in __isoc99_scanf () from /lib/i386-linux-gnu/libc.so.6
#2  0x0804848b in input_num () at sigsegv.c:7
#3  0x080484c1 in input_array (a=0xb77bcff4, n=10) at sigsegv.c:15
#4  0x080484fa in main () at sigsegv.c:21
(gdb) frame 2
#2  0x0804848b in input_num () at sigsegv.c:7
7  scanf("%d",n);
(gdb) print n
$1 = -1216622604
(gdb) q
rishabh@ubuntu:~/SIGSEGV$ 

No comments:

Post a Comment