Shell Simulation in Linux

Today we will be implementing the functionality of a shell in Linux.

Lets start by explaining what a shell is. Before the days of the graphical user interface (buttons, desktop icons, windows) the only way a user could interact with an operating system was by directly feeding it commands. These commands were entered into a black and white terminal that a human could communicate with. Here are some examples of commands you would use to interact with the Linux operating system.

 

  • ls – List the directories and files within our current directory.
  • touch – Create a new file.
  • cd ./my/directory/ – From the current directory, traverse inside of the my folder and then the directory folder.

 

These are only three of hundreds of Linux commands you would need to use if you were on a Linux system with no Graphical User Interface. This is also what we will be building today, using C and Linux system calls (functions we can use to interact with Linux). This simulated shell will even be able to pipe the data returned from one command, into another command.

 

Let us begin by entering the C libraries we will need for function calls throughout our program. The comments will explain their purpose.

 

#include <sys/types.h> //Used for the pid_t data type
#include <sys/wait.h>  //Used for the wait() function
#include <stdio.h>     //Used for entering input and retrieving output
#include <stdlib.h>    //Contains functions for manipulating strings, memory, ect
#include <unistd.h>    //Contains our needed Linux system calls

#define TRUE 1         //Define true and false (we could have used <stdbool.h>)
#define FALSE 0

At first glance, you may be confused as to what pid_t, wait(), and Linux system calls are.
Dont worry! All the details will be explained as we go along.

Next we will define our main function, where execution of our program begins. Since we are not passing any command line arguments into main, we can just leave the argument void.

 

int main(void)
{
    char buf[1024]      = {'\0'};  //Hold the entire command entered by a user
    char *command[10]   = {'\0'};  //Each string holds a command on either side of the pipe command
    char *arguments[9]  = {'\0'};  //Each string holds an argument for an individual command
    char *token_ptr;               //An iterator for the strtok function
    pid_t pid1, pid2;              //Used to hold the process ID of the two children

    /*Various indexes, flags, and variables used throughout the program*/
    int status = 0, command_index = 0, argument_index = 0;
    bool quit = FALSE, display_shell = TRUE;
    /* Index 0 = Read end of Pipe
       Index 1 = Write end of pipe*/
    int	pipe_fd[2];

    /*Remove any output buffering*/
    setbuf(stdout, NULL);
}

We will also declare the needed storage for our program. Again the use of each of these character pointers, arrays, ect will be explained when they are used. Please note that all the following code will exist inside of our main function brackets!

Here we have two boolean values:

 

  1. quit – This boolean value is set to false when you want your program to continue. Think of it as “If quit is true, then quit the program.”
  2. display_shell – When this boolean value is true, we will display the name of our shell. In this case our shell will be called the MicroShell. There are many classic examples of shell names such as BASH (Bourne Again Shell).

 

    /*If quit flag is true then end the shell*/
    while(!quit)
    {
        /*Initial shell name*/
        if(display_shell)
        {   
            printf("MicroShell>");
            display_shell = FALSE;
        }
    }

Within our while loop and after our display shell statement, we are going to add another while loop used to read in a single command entered into our shell. The code looks like the following

 
while (fgets(buf,1024,stdin) != NULL)
{

}

Lets explain how the fgets() function works.

  1. Think of fgets() as “Get a string from a file stream”.
  2. The first argument passed into fgets is a buffer used to store the string that is being read.
  3. A buffer is just a temporary location in memory where you wish to store some data. We previously declared our buffer as a space which will hold 1024 bytes of data, which is the same as storing up to 1024 characters, because each character is 1 byte. Then we defined or filled our buffer with \0. This is the equivalent of setting our buffer to blank values.
  4. The second argument tells the fgets() how big our buffer is. In our case, 1024 bytes.
  5. The third argument passed in is a symbol called stdin which stands for standard input. Whenever a program needs to read data or write data to some memory location, you must tell the program where it will read/write too. In Linux we use a number called a file descriptor to determine where we are reading or writing from. The number 0 is the very first file descriptor which happens to be the same value as stdin. This basically means, read a string from standard input, which in our case happens to be the terminal we are entering the command into. If you wanted our program to write data to terminal (instead of reading), you could use stdout or standard output. stdout happens to be the same as the file descriptor 1. You can also write any errors that may occur to stderr, which is file descriptor 2. You can also create your own additional file descriptors from 3 onward, which can identify files that you want to read or write to.
  6. Think of this statement as “Read an data entered into standard input (our terminal) and store this data as string inside of buf“.

Inside of fgets() we will now run the pipe() function. Here is a brief description of how a pipe works.

  1. A memory location known as a buffer is created. This buffer will be used as a transport data between two shell commands that are executes.
  2. Two file descriptors will be created. They will be stored in an array of two integers called pipe_fd. The first number in the array (pipe_fd[0]) will be a file descriptor defining where the read end of the pipe. Anytime you run the read() to extract data from our pipe buffer , you will read from this file descriptor. The second number in the array (pipe_fd[1]) is the write end of the pipe, this will be used to fill up our pipe buffer using write(), which can then be read from the first file descriptor.
  3. This pipe will be used to take the data that is outputted from a give shell command and will send it as input to a second command being run.

 

Pipe Example:

ls -a -1 | sort
  1. This pipe command will list all the directories in your folder.
  2. Take the folder names as strings, pass them into into the sort command as input.
  3. The sort command will sort these folders and output them to standard output (your console windows) in ascending alphabetic order.

 

Here is our statement to initialize our pipe.

if(pipe(pipe_fd) < 0)
{
    perror("piping error");
    exit(EXIT_FAILURE);
}
  • The command will return a value signalling whether our pipe was created successfully.
  • If any return value less than 0 is produced then our pipe will fail and cannot be used, so we will return an error message using perror() and exit the program using the exit() function.

 

Our next piece of code will do the following:

  1. It will set the second to last character in our 1024 byte buffer called buf to a null terminator, which is used to detect the end of the buffer when reading it.
  2. Use the strcmp() (string compare) function to compare the command the user has entered with a q or quit command to end the program. You could also use the tolower() function to make sure the input is always lowercase.
  3. Afterwards we have two indexes, known as command_index and argument_index, which are both initialized to 0.
  4. command_index will hold index to store commands on either side of the pipe. In our case the || symbol will be used to send the output of a command through a pipe.
  5. strtok() is used to break our command into two tokens. If the command is ls -a -1 || sort then command[0] will contain ls -a -1 and command[1] will contain sort.
  6. argument_index will take the arguments of a command and break them down into even smaller tokens. So if the command la -a -1 was ran, then arguments[0] would contain -a and arguments[1] would contain -1.

 

buf[strlen(buf)-1] = 0;
if(!(strcmp(buf, "q")) || !(strcmp(buf, "quit")))
{
    quit = TRUE;
    break;
}

command_index = 0;
argument_index = 0;
command[0] = strtok(buf,"||");
while((token_ptr = strtok(NULL, "||")) != NULL)
{
    command_index++;
    command[command_index] = token_ptr;
}

Next we get to the heart of the program. This section will implement the actual execution of the command and transfer of data through our pipe.

/*Create the first child process to send the command execution output to the pipe*/
pid1 = fork();
if (pid1 < 0)
{
    perror("Failure on fork for child 1.\n");
    exit(-1);
}

/*Child 1 (Execute a command and send to child 2)*/
if (pid1 == 0)
{
    /*Check if pipe command was used*/
    if(command_index > 0)
    {
        /*Break up the first command into individual character arrays for each argument*/
        arguments[0] = strtok(command[0], " ");
        while((token_ptr = strtok(NULL, " ")) != NULL)
        {
            argument_index++;
            arguments[argument_index] = token_ptr;
        }

        /*Add a null terminator to the end of our argument list*/
        argument_index++;
        arguments[argument_index] = NULL;

        /*Close the read end of our pipe*/
        close(pipe_fd[0]);
        /*Redirect stdout to the write end of our pipe*/
        dup2(pipe_fd[1], STDOUT_FILENO);
        /*Since our file descriptor is duplicated we no longer need the write end of the                         pipe*/
        close(pipe_fd[1]);

        /*Execute the first command, output will be sent to the pipe*/
        if(execvp(*arguments, arguments) < 0)
        {
            perror("First command failed to execute.\n");
            exit(-1);
        }
    }
    /*Execute a single command with no pipe*/
    else
    {
        /*Break up the first command into individual character arrays for each argument*/
        arguments[0] = strtok(command[0], " ");
        while((token_ptr = strtok(NULL, " ")) != NULL)
        {
            argument_index++;
            arguments[argument_index] = token_ptr;
        }

        /*Add a null terminator to the end of our argument list*/
        argument_index++;
        arguments[argument_index] = NULL;
        /*Execute our command, output goes to standard output*/
        if(execvp(*arguments, arguments) < 0)
        {
            perror("Command failed to execute.\n");
            exit(-1);
        }
    }
}
  1. Begin by forking the Linux process. This will create two identical processes which can then be used to pipe data by running a command in the first process and sending the output of that command to a second process.
  2. These two processes are now referred to as the child process and the parent process.
  3. fork() returns the process ID of 0 inside the child process and a regular ID number for the parent process.
  4. The first command is saved inside of the arguments array which will then be broken into into command and flag arguments. An example might be argument[0] = “ls”, argument[1] = “-a”, argument[2] = “-l”.
  5. A null terminator is added to the argument array to add bounds checking for the end of the array.
  6. Since this process will only be writing data, we can close the read end of our pipe and then redirect the file descriptor for standard output, to write to our pipe instead. This is done using the functions close() and dup2().
  7. execvp() will take an initial argument and then a vector of pointers to each of the flags used in our command. It will then run the command, which completely delete the current process. So execution of the code for that child process using execvp() ends when that function is called.

Create a third process (child process #2) and receive input from the first child, then execute the second command using that input.

else
{
    /*Create a second child process to receive input from our pipe*/
    pid2 = fork();
    if (pid2 < 0)
    {
        perror("Failure on fork for child 2.\n");
        exit(-1);
    }
    //Child 2 (Receives the output from the child 1)
    if (pid2 == 0)
    {

        /*Break up the second command into individual character arrays for each argument*/
        arguments[0] = strtok(command[1], " ");
        while((token_ptr = strtok(NULL, " ")) != NULL)
        {
            argument_index++;
            arguments[argument_index] = token_ptr;
        }

        /*Add a null terminator to the end of our argument list*/
        argument_index++;
        arguments[argument_index] = NULL;

        /*Close the write end of our pipe*/
        close(pipe_fd[1]);
        /*Use input from our pipe rather than standard input*/
        dup2(pipe_fd[0], STDIN_FILENO);
        /*Since the read end of our file descriptor was duplicated we no longer need it*/
        close(pipe_fd[0]);

        /*Execute the second command, using output from our first command*/
        if(execvp(*arguments, arguments) < 0)
        {
            perror("Second command failed to execute.\n");
            exit(-1);
        }
    }
    else
    {
        /*Reset our command/argument buffers*/
        memset(arguments, 0, sizeof(arguments));
        memset(command, 0, sizeof(command));

        /*Close both ends of the pipe*/
        close(pipe_fd[0]);
        close(pipe_fd[1]);

        /*Wait for Child 1 and Child 2 to finish*/
        waitpid(pid1, &status, 0);
        waitpid(pid2, &status, 0);

        /*Prompt the user*/
        printf("MicroShell>");
        display_shell = TRUE;
    }
}
  1. The second else statement executes a second fork creating another child that will receive data from the child #1.
  2. It will work in a similar fashion to the first child process, except this time it will close standard output and redirect input coming into the program from standard input to the pipe we had previously written to.
  3. It will then break up the second command into arguments and execute the second piped command using data sent from our pipe.
  4. Finally both ends of the pipe are closed and waitpid() is used to force the parent process to wait for the two children to finish executing their command and transferring the data.
  5. Once complete the MicroShell prompt will display once again.

 

Feel free to down the solution for this example provided below!

Posted in Programming, Projects.

Leave a Reply