C Minishell Adding Pipelines

Here’s some moderately generic but simple code to execute pipelines, a program I’m calling pipeline. It’s an SSCCE in a single file as presented, though I’d have the files stderr.h and stderr.c as separate files in a library to be linked with all my programs. (Actually, I have a more complex set of functions in my ‘real’ stderr.c and stderr.h, but this is a good starting point.)

The code operates in two ways. If you supply no arguments, then it runs a built-in pipeline:

who | awk '{print $1}' | sort | uniq -c | sort -n

This counts the number of times each person is logged in on the system, presenting the list in order of increasing number of sessions. Alternatively, you can invoke with a sequence of arguments that are the command line you want invoked, use a quoted pipe '|' (or "|") to separate commands:

Valid:

pipeline
pipeline ls '|' wc
pipeline who '|' awk '{print $1}' '|' sort '|' uniq -c '|' sort -n
pipeline ls

Invalid:

pipeline '|' wc -l
pipeline ls '|' '|' wc -l
pipeline ls '|' wc -l '|'

The last three invocations enforce ‘pipes as separators’. The code does not error check every system call; it does error check fork(), execvp() and pipe(), but skips checking on dup2() and close(). It doesn’t include diagnostic printing for the commands that are generated; a -x option to pipeline would be a sensible addition, causing it to print out a trace of what it does. It also does not exit with the exit status of the last command in the pipeline.

Note that the code starts with a child being forked. The child will become the last process in the pipeline, but first creates a pipe and forks another process to run the earlier processes in the pipeline. The mutually recursive functions are unlikely to be the only way of sorting things out, but they do leave minimal code repetition (earlier drafts of the code had the content of exec_nth_command() largely repeated in exec_pipeline() and exec_pipe_command()).

The process structure here is such that the original process only knows about the last process in the pipeline. It is possible to redesign things in such a way that the original process is the parent of every process in the pipeline, so the original process can report separately on the status of each command in the pipeline. I’ve not yet modified the code to allow for that structure; it will be a little more complex, though not hideously so.

/* One way to create a pipeline of N processes */

/* stderr.h */
#ifndef STDERR_H_INCLUDED
#define STDERR_H_INCLUDED

static void err_setarg0(const char *argv0);
static void err_sysexit(char const *fmt, ...);
static void err_syswarn(char const *fmt, ...);

#endif /* STDERR_H_INCLUDED */

/* pipeline.c */
#include <assert.h>
#include <stdio.h>
#include <string.h>
#include <sys/wait.h>
#include <unistd.h>
/*#include "stderr.h"*/

typedef int Pipe[2];

/* exec_nth_command() and exec_pipe_command() are mutually recursive */
static void exec_pipe_command(int ncmds, char ***cmds, Pipe output);

/* With the standard output plumbing sorted, execute Nth command */
static void exec_nth_command(int ncmds, char ***cmds)
{
    assert(ncmds >= 1);
    if (ncmds > 1)
    {
        pid_t pid;
        Pipe input;
        if (pipe(input) != 0)
            err_sysexit("Failed to create pipe");
        if ((pid = fork()) < 0)
            err_sysexit("Failed to fork");
        if (pid == 0)
        {
            /* Child */
            exec_pipe_command(ncmds-1, cmds, input);
        }
        /* Fix standard input to read end of pipe */
        dup2(input[0], 0);
        close(input[0]);
        close(input[1]);
    }
    execvp(cmds[ncmds-1][0], cmds[ncmds-1]);
    err_sysexit("Failed to exec %s", cmds[ncmds-1][0]);
    /*NOTREACHED*/
}

/* Given pipe, plumb it to standard output, then execute Nth command */
static void exec_pipe_command(int ncmds, char ***cmds, Pipe output)
{
    assert(ncmds >= 1);
    /* Fix stdout to write end of pipe */
    dup2(output[1], 1);
    close(output[0]);
    close(output[1]);
    exec_nth_command(ncmds, cmds);
}

/* Execute the N commands in the pipeline */
static void exec_pipeline(int ncmds, char ***cmds)
{
    assert(ncmds >= 1);
    pid_t pid;
    if ((pid = fork()) < 0)
        err_syswarn("Failed to fork");
    if (pid != 0)
        return;
    exec_nth_command(ncmds, cmds);
}

/* Collect dead children until there are none left */
static void corpse_collector(void)
{
    pid_t parent = getpid();
    pid_t corpse;
    int   status;
    while ((corpse = waitpid(0, &status, 0)) != -1)
    {
        fprintf(stderr, "%d: child %d status 0x%.4X\n",
                (int)parent, (int)corpse, status);
    }
}

/*  who | awk '{print $1}' | sort | uniq -c | sort -n */
static char *cmd0[] = { "who",                0 };
static char *cmd1[] = { "awk",  "{print $1}", 0 };
static char *cmd2[] = { "sort",               0 };
static char *cmd3[] = { "uniq", "-c",         0 };
static char *cmd4[] = { "sort", "-n",         0 };

static char **cmds[] = { cmd0, cmd1, cmd2, cmd3, cmd4 };
static int   ncmds = sizeof(cmds) / sizeof(cmds[0]);

static void exec_arguments(int argc, char **argv)
{
    /* Split the command line into sequences of arguments */
    /* Break at pipe symbols as arguments on their own */
    char **cmdv[argc/2];            // Way too many
    char  *args[argc+1];
    int cmdn = 0;
    int argn = 0;

    cmdv[cmdn++] = &args[argn];
    for (int i = 1; i < argc; i++)
    {
        char *arg = argv[i];
        if (strcmp(arg, "|") == 0)
        {
            if (i == 1)
                err_sysexit("Syntax error: pipe before any command");
            if (args[argn-1] == 0)
                err_sysexit("Syntax error: two pipes with no command between");
            arg = 0;
        }
        args[argn++] = arg;
        if (arg == 0)
            cmdv[cmdn++] = &args[argn];
    }
    if (args[argn-1] == 0)
        err_sysexit("Syntax error: pipe with no command following");
    args[argn] = 0;
    exec_pipeline(cmdn, cmdv);
}

int main(int argc, char **argv)
{
    err_setarg0(argv[0]);
    if (argc == 1)
    {
        /* Run the built in pipe-line */
        exec_pipeline(ncmds, cmds); 
    }
    else
    {
        /* Run command line specified by user */
        exec_arguments(argc, argv);
    }
    corpse_collector();
    return(0);
}

/* stderr.c */
/*#include "stderr.h"*/
#include <stdio.h>
#include <stdarg.h>
#include <errno.h>
#include <string.h>
#include <stdlib.h>

static const char *arg0 = "<undefined>";

static void err_setarg0(const char *argv0)
{
    arg0 = argv0;
}

static void err_vsyswarn(char const *fmt, va_list args)
{
    int errnum = errno;
    fprintf(stderr, "%s:%d: ", arg0, (int)getpid());
    vfprintf(stderr, fmt, args);
    if (errnum != 0)
        fprintf(stderr, " (%d: %s)", errnum, strerror(errnum));
    putc('\n', stderr);
}

static void err_syswarn(char const *fmt, ...)
{
    va_list args;
    va_start(args, fmt);
    err_vsyswarn(fmt, args);
    va_end(args);
}

static void err_sysexit(char const *fmt, ...)
{
    va_list args;
    va_start(args, fmt);
    err_vsyswarn(fmt, args);
    va_end(args);
    exit(1);
}

Signals and SIGCHLD

The POSIX Signal Concepts section discusses SIGCHLD:

Under SIG_DFL:

If the default action is to ignore the signal, delivery of the signal shall have no effect on the process.

Under SIG_IGN:

If the action for the SIGCHLD signal is set to SIG_IGN, child processes of the calling processes shall not be transformed into zombie processes when they terminate. If the calling process subsequently waits for its children, and the process has no unwaited-for children that were transformed into zombie processes, it shall block until all of its children terminate, and wait(), waitid(), and waitpid() shall fail and set errno to [ECHILD].

The description of <signal.h> has a table of default dispositions for signals, and for SIGCHLD, the default is I (SIG_IGN).


I added another function to the code above:

#include <signal.h>

typedef void (*SigHandler)(int signum);

static void sigchld_status(void)
{
    const char *handling = "Handler";
    SigHandler sigchld = signal(SIGCHLD, SIG_IGN);
    signal(SIGCHLD, sigchld);
    if (sigchld == SIG_IGN)
        handling = "Ignored";
    else if (sigchld == SIG_DFL)
        handling = "Default";
    printf("SIGCHLD set to %s\n", handling);
}

I called it immediately after the call to err_setarg0(), and it reports ‘Default’ on both Mac OS X 10.7.5 and Linux (RHEL 5, x86/64). I validated its operation by running:

(trap '' CHLD; pipeline)

On both platforms, that reported ‘Ignored’, and the pipeline command no longer reported the exit status of the child; it didn’t get it.

So, if the program is ignoring SIGCHLD, it does not generate any zombies, but does wait until ‘all’ of its children terminate. That is, until all of its direct children terminate; a process cannot wait on its grandchildren or more distant progeny, nor on its siblings, nor on its ancestors.

On the other hand, if the setting for SIGCHLD is the default, the signal is ignored, and zombies are created.

That’s the most convenient behaviour for this program as written. The corpse_collector() function has a loop that collects the status information from any children. There’s only one child at a time with this code; the rest of the pipeline is run as a child (of the child, of the child, …) of the last process in the pipeline.


However I’m having trouble with zombies/corpses. My teacher had me implement it the same way you did, as cmd1 isn’t the parent of cmd2 in the case of: “cmd1 | cmd2 | cmd3“. Unless I tell my shell to wait on each process (cmd1, cmd2, and cmd3), rather than just waiting on the last process (cmd3), the entire pipeline shuts down before the output can reach the end. I’m having trouble figuring out a good way to wait on them; my teacher said to use WNOHANG.

I’m not sure I understand the problem. With the code I provided, cmd3 is the parent of cmd2, and cmd2 is the parent of cmd1 in a 3-command pipeline (and the shell is the parent of cmd3), so the shell can only wait on cmd3. I did state originally:

The process structure here is such that the original process only knows about the last process in the pipeline. It is possible to redesign things in such a way that the original process is the parent of every process in the pipeline, so the original process can report separately on the status of each command in the pipeline. I’ve not yet modified the code to allow for that structure; it will be a little more complex, though not hideously so.

If you’ve got your shell able to wait on all three commands in the pipeline, you must be using the alternative organization.

The waitpid() description includes:

The pid argument specifies a set of child processes for which status is requested. The waitpid() function shall only return the status of a child process from this set:

  • If pid is equal to (pid_t)-1, status is requested for any child process. In this respect, waitpid() is then equivalent to wait().

  • If pid is greater than 0, it specifies the process ID of a single child process for which status is requested.

  • If pid is 0, status is requested for any child process whose process group ID is equal to that of the calling process.

  • If pid is less than (pid_t)-1, status is requested for any child process whose process group ID is equal to the absolute value of pid.

The options argument is constructed from the bitwise-inclusive OR of zero or more of the following flags, defined in the header:

WNOHANG
The waitpid() function shall not suspend execution of the calling thread if status is not immediately available for one of the child processes specified by pid.

This means that if you’re using process groups and the shell knows which process group the pipeline is running in (for example, because the pipeline is put into its own process group by the first process), then the parent can wait for the appropriate children to terminate.

…rambling… I think there’s some useful information here; there probably should be more that I’m writing, but my mind’s gone blank.

Leave a Comment