how shebang #! mechanism's "executor" program is chosen

It is well known that the first line of a script is special if it starts with "#!" What follows those characters is taken to be the program that should execute the script. For example where a script's first line is "#!/usr/bin/python" python ends up running, and executes the script. What is less well known is exactly how and where this handoff takes place, and which software does it. I have read accounts like:

"The #!... is used by the kernel to identify the program that should be interpreting the lines in the script."

"When a script is executed with this as its first line, the shell reads the #! as meaning, 'use the following shell or interpreter to execute the rest of the script.'"

Some seem to say it is the kernel performing the response to #! but others suggest it's the shell doing it. Who does it? shell? kernel? or both? or one if not the other? does it depend on which shell you are using at the time? does it depend on how the shell gets called?

Let's try to find out. It turns out that both the kernel and the shell may provide for calling a script interpreter. It depends on which kernel, and which shell. Some don't. And if they do provide for calling an interpreter, whether they actually call it may be conditional.

The exercise to perform

Operate as root.

mkdir /exectests
cd /exectests

Obtain the file try-to-exec.zip from this link or perhaps more easily by:

wget http://homepage.smc.edu/morgan_david/linadmin/downloads/try-to-exec.zip

It must end up in /exectests. Then:

unzip try-to-exec.zip

It includes the file try-to-exec-v1.c:

     1  /*      try-to-exec-v1.c
     2          tries to run /exectests/cal the same way we think a production shell does
     3          but if it doesn't work, does not try to compensate
     4  */
     5
     6  #include        
     7  #include        
     8  #include        
     9  main() {
    10          printf("\nAbout to try /exectests/cal...\n\n");
    11          if ( execl("/exectests/cal","cal",NULL) == -1 ) /* if we failed to exec */
    12                  {
    13                  perror("What happened");        /* diagnosis, please */
    14                  }
    15

This program attempts to duplicate the logic of that portion of the shell which executes commands typed on its command line. Consider it a mini shell surrogate. For simplification there's no command line here and the to-be-attempted command is hard-coded as "/exectests/cal". Compile the program:

gcc try-to-exec-v1.c -o try-to-exec-v1

Run it:

./try-to-exec-v1

In line 6, using the execl( ) system function call, it tries to cause execution of a file named "cal" in the current /exectests directory. But there is no "cal" file. In that case, per the man page:

RETURN VALUE
       If  any of the exec() functions returns, an error will have occurred.  The
       return value is -1, and the global variable errno will be set to  indicate
       the error.

The perror( ) function in line 8 gets called due to the error (otherwise, a successful execl would have replaced this code altogether previously in line 6). perror detects which error occurred (by inspection of the global variable "errno") and prints a friendly explanation accordingly. Namely, "Hey what are you talking about? there is no /exectests/cal?!" or equivalent.

Let's create a different error condition:

touch cal
chmod -x cal

a cal file now exists, but stripped of execute permission. That should be problematic. Run try-to-exec-v1 to see how perror( ) reflects the new error:

./try-to-exec-v1

"Hey what, I'm not allowed to execute that!!" it says.

Let's fix these problems by placing a copy of the well known "cal" program, which prints a calendar, into our directory:

cp $( which cal ) .
chmod +x cal

Now a copy of the "real" cal program is in place. And it has execute permissions. Run it just to be sure:

./cal

You see the current month's calendar. try-to-exec now has something to work with:

./try-to-exec-v1

You get the calendar again, and no error message (our code never reached perror because execl succeeded and displaced our code). In the first instance your current bash shell ran cal (technically, cal was exec'd in a child the shell had fork'd). In the second, your current shell ran try-to-exec-v1 which in turn ran cal. Your try-to-exec-v1 program ran cal the same way your shell did. And the same way the shell ran your program. And the same way the shell runs any program-- via exec( ).

Now replace this cal with a different one:

cp cal.sh cal
chmod +x cal

"cal" is no longer the well know binary executable. Rather it's a text file. Here are its contents:

echo
echo "====="
echo "I am /cal"
echo " but not really, I don't print the calendar"
echo "I am a cal imposter script"
echo " but as you can see I'm running just fine"
echo "====="
echo

Can a shell run this cal? Can exec( )? Let a shell try:

/bin/sh ./cal

Let exec( ) try:

./try-to-exec-v1

The new error message is "Exec format error". Read about it:

       ENOEXEC
              An executable is not in a  recognized  format,  is  for  the  wrong
              architecture,  or  has some other format error that means it cannot
              be executed.
		-- man page for execve (one of execl's exec family siblings)

exec( ) expects machine instructions, which the text in cal is not. So it can't run cal.

But please change cal by adding a new first line:

#!/bin/sh
echo
echo "====="
echo "I am /cal"
echo " but not really, I don't print the calendar"
echo "I am a cal imposter script"
echo " but as you can see I'm running just fine"
echo "====="
echo

and try again:

./try-to-exec-v1

This time exec( ) got it to run.

We're interested in 3 scenarios. We've looked at all 3 and seen 2 of them succeed.

- cal is made of machine code - succeeded
- cal is made of text, with the magic "#!" first line - succeeded
- cal is made of text, without the magic "#!" first line - failed

Two questions about our 2 success cases:

1. what is it about the "#!/bin/sh" line that enabled exec( ) to succeed?
2. what is it about the shell that allowed it to succeed even without that line while exec( ) failed?

With two answers:

1. what is it about the "#!/bin/sh" line that enabled exec( ) to succeed?
exec didn't really succeed. It handed the job off to /bin/sh. So exec didn't exec cal, it exec'd sh. Why did it do that? Because it is written to pay attention to a first line commencing with "#!" and defer to (i.e., exec) whatever program is named in it. It exec's that program in response to inability to exec the given program. And that "secondary" program (provided it's made of machine code) succeeds to run. That's broadly what this says:

   Interpreter scripts
       An interpreter script is a text file that has execute permission enabled and whose  first  line
       is of the form:

           #! interpreter [optional-arg]

       The  interpreter  must  be a valid pathname for an executable which is not itself a script.  If
       the filename argument of execve() specifies an interpreter script,  then  interpreter  will  be
       invoked with the following arguments:

           interpreter [optional-arg] filename arg...

       where arg...  is the series of words pointed to by the argv argument of execve().

   -- man page for execve system function

2. what is it about the shell that allowed it to succeed ( "/bin/sh ./cal" , above) even without that line while exec( ) failed?
The secondary, interpreter program receives the given program file (the very one that nominated it with shebang) as its argument. Because that's how exec calls it. The secondary program takes over and runs whatever's in the file, the stuff that offended exec because it isn't made of machine code. But if it's made of shell code and the line-1 interpreter program is the shell, or perl code and the line-1 shebang program is perl, or python code and the line-1 program is python, it all runs fine. (Presumably these "script interpreter" programs regard the "#" character as a comment initiator so the 1st line that got special response from exec will get no response from the interpreter program. That is in fact the case for shell, perl, and python all.)

How about the 3rd case, the failure case, where a text file lacks the magic line? Is it possible for that to run? Well sure, we saw it run already above, when /bin/sh tried to do it. We tried to run it twice, and although "./try-to-exec-v1" failed "/bin/sh ./cal" succeeded. The failure is confined to exec( ), which is all that try-to-exec-v1 attempted. Since try-to-exec-v1 is supposed to be a shell surrogate, what could we add to it so that it too will respond to this case with success, as did /bin/sh? Please see try-to-exec-v2.c:

     1  /*      try-to-exec-v2.c
     2          tries to run /exectests/cal the same way we think a producti on shell does
     3          if it doesn't work, compensates
     4  */
     5
     6  #include        
     7  #include        
     8  #include        
     9  main() {
    10          printf("\nAbout to try /exectests/cal...\n\n");
    11          if ( execl("/exectests/cal","cal",NULL) == -1 ) /* if we fai led to exec */
    12                  {
    13                  perror("What happened");        /* diagnosis, please  */
    14                  if ( errno==ENOEXEC)            /* fallback for wron g stuff in the file */
    15                          {
    16                          printf("\n\nBut a shell might still be able  to execute /exectests/cal:\n");
    17                          execl ("/bin/sh", "sh", "-c", "/exectests/ca l", (char *)0);
    18                          }
    19                  }
    20

Here, our mini-shell does something similar to what exec( ) does, in providing a fallback second resort. After exec( ) returns its refusal in the form of the ENOEXEC "exec format" error, version 2 mini-shell doesn't just give up. Rather in that case it calls exec( ) a second time, asking now for execution of /bin/sh (not /exectests/cal) and handing off /exectests/cal as an argument in exec( )'s argument list. This results in the same thing as when you did "/bin/sh ./cal". Then, you asked for execution of /bin/sh and handed off /exectests/cal as an argument on the command line. Run version 2 and verify it succeeds against a file containing shell script code but lacking the magic first line. At the same time run version 1 noting its failure. Before running, edit the magic first line out of /exectests/cal. Then:

gcc try-to-exec-v2.c -o try-to-exec-v2
./try-to-exec-v1
./try-to-exec-v2

No doubt the real shell does something quite similar in this regard.

There are 3 "branches" our attempt to run a file might take here. If it's machine code, produced by a compiler or assembler, it runs immediately when falling into the hands of exec( ). If it's text with the magic line, it gets a referral by exec( ) to the magic line program, who runs it. If it's text without the magic line, it gets an error kicked back by exec( ) to the calling program (our mini-shell, or a real one) who turns around and calls upon a shell to run it. That shell might be a copy of the calling one, or a different one. Exactly how a given shell or environment handles this case, and which shell ends up running, may differ among environments. Note that this means that you can call a perl or python script only with a magic line that calls the appropriate interpreter, but you can call shell script with or without a magic line (though the shell you get might not be the same one in both cases).

If you wish to test that, you can try running this with, then without, the first line:

#!/usr/bin/perl
print "Hello, World!\n";

It doesn't work without it, though were it a shell script instead of perl script it would.

A comprehensive exploratory test

Let's do a comprehensive test of the different ways to try executing the different files, which are made of different materials. We will rename each file, in every case, as "cal" before trying to run it. The ways to attempt execution are:

- call it as itself, namely, execute "cal" at the shell prompt
- provide it as a shell argument, that is, exeucte "bash cal" at the shell prompt
- get try-to-exec-v1 to try to run it, that is, execute "try-to-exec-v1" at the shell prompt
- get try-to-exec-v2 to try to run it, that is, execute "try-to-exec-v2" at the shell prompt

and the five "different materials" of which might consist a program that we present under the "cal" name are:

- machine code
- shell code, with a shebang naming the shell
- shell code, with no shebang
- perl code, with a shebang naming perl
- perl code, with no shebang

The 20 combinations of those 4 with these 5 are represented in the table in try-to-exec-worksheet.pdf (included in the file you unzipped earlier). Print it out or obtain it as a handout from the instructor. In each box, put a check mark or X depending whether the type of program for the box's column ran successfully or not using the method for the box's row. Do it column by column, first copying that column's program under the name "cal" into the /exectests directory. Also make any notes in the box if you like, telling what error message may have appeared for example.

Then, analytically try in each case to construct the path of execution that led to each success or failure. How far did it get along the line of execution handoffs before reaching its ultimate fate? When it succeeded or failed, in whose hands was it? exec( )? bash? sh? perl?

------------
For reference, related section of the bash man page followed by that of the tcsh man page:

bash man page:
COMMAND EXECUTION
       After a command has been split into words, if it results in a simple
       command and an optional list of arguments, the following actions are
       taken.

       If the command name contains  no	 slashes,  the	shell  attempts	 to
       locate  it.   If	 there	exists	a shell function by that name, that
       function is invoked as described above in FUNCTIONS.   If  the  name
       does  not match a function, the shell searches for it in the list of
       shell builtins.	If a match is found, that builtin is invoked.

       If the name is neither a shell function nor a builtin, and  contains
       no  slashes,  bash searches each element of the PATH for a directory
       containing an executable file by that name.  Bash uses a hash  table
       to  remember  the full pathnames of executable files (see hash under
       SHELL BUILTIN COMMANDS below).  A full search of the directories	 in
       PATH  is	 performed only if the command is not found in the hash ta-
       ble.  If the search  is	unsuccessful,  the  shell  searches  for  a
       defined	shell  function	 named	command_not_found_handle.   If that
       function exists, it is invoked with the	original  command  and	the
       original	 command's  arguments  as its arguments, and the function's
       exit status becomes the exit status of the shell.  If that  function
       is  not	defined,  the  shell prints an error message and returns an
       exit status of 127.

       If the search is successful, or if the command name contains one	 or
       more  slashes,  the  shell  executes the named program in a separate
       execution environment.  Argument 0 is set to the name given, and the
       remaining  arguments  to the command are set to the arguments given,
       if any.

       If this execution fails because the file is not in  executable  for-
       mat,  and  the  file is not a directory, it is assumed to be a shell
       script, a file containing shell commands.  A subshell is spawned	 to
       execute	it.  This subshell reinitializes itself, so that the effect
       is as if a new shell had been invoked to handle the script, with the
       exception  that	the  locations of commands remembered by the parent
       (see hash below under SHELL BUILTIN COMMANDS) are  retained  by	the
       child.

       If  the	program	 is  a file beginning with #!, the remainder of the
       first line specifies an interpreter for the program.  The shell exe-
       cutes  the  specified  interpreter  on operating systems that do not
       handle this executable format  themselves.   The	 arguments  to	the
       interpreter  consist  of	 a  single  optional argument following the
       interpreter name on the first line of the program, followed  by	the
       name of the program, followed by the command arguments, if any.

tcsh man page:
   Builtin and non-builtin command execution
       Builtin commands are executed within the shell.	If any component of
       a  pipeline  except  the	 last is a builtin command, the pipeline is
       executed in a subshell.

       Parenthesized commands are always executed in a subshell.

	   (cd; pwd); pwd

       thus prints the home directory, leaving you where you were (printing
       this after the home directory), while

	   cd; pwd

       leaves  you  in the home directory.  Parenthesized commands are most
       often used to prevent cd from affecting the current shell.

       When a command to be executed is found not to be a  builtin  command
       the  shell attempts to execute the command via execve(2).  Each word
       in the variable path names a directory in which the shell will  look
       for  the	 command.  If the shell is not given a -f option, the shell
       hashes the names in these directories into an internal table so that
       it will try an execve(2) in only a directory where there is a possi-
       bility that the command resides there.  This greatly speeds  command
       location	 when  a  large	 number	 of  directories are present in the
       search path. This hashing mechanism is not used:

       1.  If hashing is turned explicitly off via unhash.

       2.  If the shell was given a -f argument.

       3.  For each directory component of path which does not begin with a
	   `/'.

       4.  If the command contains a `/'.

       In the above four cases the shell concatenates each component of the
       path vector with the given command name to form a  path	name  of  a
       file  which it then attempts to execute it. If execution is success-
       ful, the search stops.

       If the file has execute permissions but is not an executable to	the
       system  (i.e.,  it is neither an executable binary nor a script that
       specifies its interpreter), then it is assumed to be a file contain-
       ing shell commands and a new shell is spawned to read it.  The shell
       special alias may be set to specify an interpreter  other  than	the
       shell itself.

       On  systems which do not understand the `#!' script interpreter con-
       vention the shell may be compiled to emulate  it;  see  the  version
       shell  variable.	 If so, the shell checks the first line of the file
       to see if it is of the form `#!interpreter arg ...'.  If it is,	the
       shell  starts  interpreter with the given args and feeds the file to
       it on standard input.