How is the system call in Linux implemented?

Have a look at this.

Starting with version 2.5, linux
kernel introduced a new system call
entry mechanism on Pentium II+
processors. Due to performance issues
on Pentium IV processors with existing
software interrupt method, an
alternative system call entry
mechanism was implemented using
SYSENTER/SYSEXIT instructions
available on Pentium II+ processors.
This article explores this new
mechanism. Discussion is limited to
x86 architecture and all source code
listings are based on linux kernel
2.6.15.6.

  1. What are system calls?

    System calls provide userland
    processes a way to request services
    from the kernel. What kind of
    services? Services which are managed
    by operating system like storage,
    memory, network, process management
    etc. For example if a user process
    wants to read a file, it will have to
    make ‘open’ and ‘read’ system calls.
    Generally system calls are not called
    by processes directly. C library
    provides an interface to all system
    calls.

  2. What happens in a system call?

    A kernel code snippet is run on
    request of a user process. This code
    runs in ring 0 (with current privilege
    level -CPL- 0), which is the highest
    level of privilege in x86
    architecture. All user processes run
    in ring 3 (CPL 3).

    So, to implement system call mechanism, what we need is

    1) a way to call ring 0 code from ring 3.

    2) some kernel code to service the request.

  3. Good old way of doing it

    Until some time back, linux used to
    implement system calls on all x86
    platforms using software interrupts.
    To execute a system call, user process
    will copy desired system call number
    to %eax and will execute ‘int 0x80’.
    This will generate interrupt 0x80 and
    an interrupt service routine will be
    called. For interrupt 0x80, this
    routine is an “all system calls
    handling” routine. This routine will
    execute in ring 0. This routine, as
    defined in the file
    /usr/src/linux/arch/i386/kernel/entry.S,
    will save the current state and call
    appropriate system call handler based
    on the value in %eax.

  4. New shiny way of doing it

    It was found out that this software
    interrupt method was much slower on
    Pentium IV processors. To solve this
    issue, Linus implemented an
    alternative system call mechanism to
    take advantage of SYSENTER/SYSEXIT
    instructions provided by all Pentium
    II+ processors. Before going further
    with this new way of doing it, let’s
    make ourselves more familiar with
    these instructions.

Leave a Comment