how shebang #! mechanism's "executor" program is chosen
It is well known that the first line of a script is special if it starts with "#!" What follows those characters is taken to be the program that should execute the script. For example where a script's first line is "#!/usr/bin/python" python ends up running, and executes the script. What is less well known is exactly how and where this handoff takes place, and which software does it. I have read accounts like:
"The #!... is used by the kernel to identify the program that should be interpreting the lines in the script."
"When a script is executed with this as its first line, the shell reads the #! as meaning, 'use the following shell or interpreter to execute the rest of the script.'"
Some seem to say it is the kernel performing the response to #! but others suggest it's the shell doing it. Who does it? shell? kernel? or both? or one if not the other? does it depend on which shell you are using at the time? does it depend on how the shell gets called?
Let's try to find out. It turns out that both the kernel and the shell may provide for calling a script interpreter. It depends on which kernel, and which shell. Some don't. And if they do provide for calling an interpreter, whether they actually call it may be conditional.
The exercise to perform
Operate as root.
mkdir /exectests
cd /exectests
Obtain the file try-to-exec.zip from this
link or perhaps more easily by:
wget http://homepage.smc.edu/morgan_david/linadmin/downloads/try-to-exec.zip
It must end up in /exectests. Then:
unzip try-to-exec.zip
It includes the file try-to-exec-v1.c:
1 /* try-to-exec-v1.c
2 tries to run /exectests/cal the same way we think a production shell does
3 but if it doesn't work, does not try to compensate
4 */
5
6 #include
7 #include
8 #include
9 main() {
10 printf("\nAbout to try /exectests/cal...\n\n");
11 if ( execl("/exectests/cal","cal",NULL) == -1 ) /* if we failed to exec */
12 {
13 perror("What happened"); /* diagnosis, please */
14 }
15
This program attempts to duplicate the logic of that portion of the shell which executes commands typed on its command line. Consider it a mini shell surrogate. For simplification there's no command line here and the to-be-attempted command is hard-coded as "/exectests/cal". Compile the program:
gcc try-to-exec-v1.c -o try-to-exec-v1
Run it:
./try-to-exec-v1
In line 6, using the execl( ) system function call, it tries to cause execution of a file named "cal" in the current /exectests directory. But there is no "cal" file. In that case, per the man page:
RETURN VALUE
If any of the exec() functions returns, an error will have occurred. The
return value is -1, and the global variable errno will be set to indicate
the error.
The perror( ) function in line 8 gets called due to the error (otherwise, a successful execl would have replaced this code altogether previously in line 6). perror detects which error occurred (by inspection of the global variable "errno") and prints a friendly explanation accordingly. Namely, "Hey what are you talking about? there is no /exectests/cal?!" or equivalent.
Let's create a different error condition:
touch cal
chmod -x cal
a cal file now exists, but stripped of execute permission. That should be problematic. Run try-to-exec-v1 to see how perror( ) reflects the new error:
./try-to-exec-v1
"Hey what, I'm not allowed to execute that!!" it says.
Let's fix these problems by placing a copy of the well known "cal" program, which prints a calendar, into our directory:
cp $( which cal ) .
chmod +x cal
Now a copy of the "real" cal program is in place. And it has execute permissions. Run it just to be sure:
./cal
You see the current month's calendar. try-to-exec now has something to work with:
./try-to-exec-v1
You get the calendar again, and no error message (our code never reached perror because execl succeeded and displaced our code). In the first instance your current bash shell ran cal (technically, cal was exec'd in a child the shell had fork'd). In the second, your current shell ran try-to-exec-v1 which in turn ran cal. Your try-to-exec-v1 program ran cal the same way your shell did. And the same way the shell ran your program. And the same way the shell runs any program-- via exec( ).
Now replace this cal with a different one:
cp cal.sh cal
chmod +x cal
"cal" is no longer the well know binary executable. Rather it's a text file. Here are its contents:
echo echo "=====" echo "I am /cal" echo " but not really, I don't print the calendar" echo "I am a cal imposter script" echo " but as you can see I'm running just fine" echo "=====" echo
Can a shell run this cal? Can exec( )? Let a shell try:
/bin/sh ./cal
Let exec( ) try:
./try-to-exec-v1
The new error message is "Exec format error". Read about it:
ENOEXEC
An executable is not in a recognized format, is for the wrong
architecture, or has some other format error that means it cannot
be executed.
-- man page for execve (one of execl's exec family siblings)
exec( ) expects machine instructions, which the text in cal is not. So it can't run cal.
But please change cal by adding a new first line:
#!/bin/sh echo echo "=====" echo "I am /cal" echo " but not really, I don't print the calendar" echo "I am a cal imposter script" echo " but as you can see I'm running just fine" echo "=====" echo
and try again:
./try-to-exec-v1
This time exec( ) got it to run.
We're interested in 3 scenarios. We've looked at all 3 and seen 2 of them succeed.
- cal is made of machine code - succeeded
- cal is made of text, with the magic "#!" first line - succeeded
- cal is made of text, without the magic "#!" first line - failed
Two questions about our 2 success cases:
1. what is it about the "#!/bin/sh"
line that enabled exec( ) to succeed?
2. what is it about the shell that
allowed it to succeed even without that line while exec( ) failed?
With two answers:
1. what is it about the "#!/bin/sh"
line that enabled exec( ) to succeed?
exec didn't really succeed. It handed the job off to /bin/sh. So exec didn't
exec cal, it exec'd sh. Why did it do that? Because it is written to pay
attention to a first line commencing with "#!" and defer to (i.e.,
exec) whatever program is named in it. It exec's that program in response
to inability to exec the given program. And that "secondary"
program (provided it's made of machine code) succeeds to run. That's broadly
what this says:
Interpreter scripts
An interpreter script is a text file that has execute permission enabled and whose first line
is of the form:
#! interpreter [optional-arg]
The interpreter must be a valid pathname for an executable which is not itself a script. If
the filename argument of execve() specifies an interpreter script, then interpreter will be
invoked with the following arguments:
interpreter [optional-arg] filename arg...
where arg... is the series of words pointed to by the argv argument of execve().
-- man page for execve system function
2. what is it about the shell that
allowed it to succeed ( "/bin/sh ./cal" , above) even without that line while exec( ) failed?
The secondary, interpreter program receives the given program file (the very one that
nominated it with shebang) as its argument. Because that's how exec calls it. The
secondary program takes over and runs whatever's in the file, the stuff that
offended exec because it isn't made of machine code. But if it's made of shell
code and the line-1 interpreter program is the shell, or perl code and the line-1
shebang program is
perl, or python code and the line-1 program is python, it all runs fine.
(Presumably these "script interpreter" programs regard the
"#" character as a comment initiator so the 1st line that got special
response from exec will get no response from the interpreter program. That is in
fact the case for shell, perl, and python all.)
How about the 3rd case, the failure case, where a text file lacks the magic line? Is it possible for that to run? Well sure, we saw it run already above, when /bin/sh tried to do it. We tried to run it twice, and although "./try-to-exec-v1" failed "/bin/sh ./cal" succeeded. The failure is confined to exec( ), which is all that try-to-exec-v1 attempted. Since try-to-exec-v1 is supposed to be a shell surrogate, what could we add to it so that it too will respond to this case with success, as did /bin/sh? Please see try-to-exec-v2.c:
1 /* try-to-exec-v2.c
2 tries to run /exectests/cal the same way we think a producti on shell does
3 if it doesn't work, compensates
4 */
5
6 #include
7 #include
8 #include
9 main() {
10 printf("\nAbout to try /exectests/cal...\n\n");
11 if ( execl("/exectests/cal","cal",NULL) == -1 ) /* if we fai led to exec */
12 {
13 perror("What happened"); /* diagnosis, please */
14 if ( errno==ENOEXEC) /* fallback for wron g stuff in the file */
15 {
16 printf("\n\nBut a shell might still be able to execute /exectests/cal:\n");
17 execl ("/bin/sh", "sh", "-c", "/exectests/ca l", (char *)0);
18 }
19 }
20
Here, our mini-shell does something similar to what exec( ) does, in providing a fallback second resort. After exec( ) returns its refusal in the form of the ENOEXEC "exec format" error, version 2 mini-shell doesn't just give up. Rather in that case it calls exec( ) a second time, asking now for execution of /bin/sh (not /exectests/cal) and handing off /exectests/cal as an argument in exec( )'s argument list. This results in the same thing as when you did "/bin/sh ./cal". Then, you asked for execution of /bin/sh and handed off /exectests/cal as an argument on the command line. Run version 2 and verify it succeeds against a file containing shell script code but lacking the magic first line. At the same time run version 1 noting its failure. Before running, edit the magic first line out of /exectests/cal. Then:
gcc try-to-exec-v2.c -o
try-to-exec-v2
./try-to-exec-v1
./try-to-exec-v2
No doubt the real shell does something quite similar in this regard.
There are 3 "branches" our attempt to run a file might take here. If
it's machine code, produced by a compiler or assembler, it runs immediately when falling into the hands of exec( ). If it's text with the magic line, it
gets a referral by exec( ) to the magic line program, who runs it. If it's text
without the magic line, it gets an error kicked back by exec( ) to the calling
program (our mini-shell, or a real one) who turns around and calls upon a shell
to run it. That shell might be a copy of the calling one, or a different one.
Exactly how a given shell or environment handles this case, and which shell ends
up running, may differ among environments. Note that this means that you can
call a perl or python script only with a magic line that calls the
appropriate interpreter, but you can call shell script with or without a magic
line (though the shell you get might not be the same one in both cases).
If you wish to test that, you can try running this with, then without, the first line:
#!/usr/bin/perl
print "Hello, World!\n";
It doesn't work without it, though were it a shell script instead of perl script it would.
A comprehensive exploratory test
Let's do a comprehensive test of the different ways to try executing the different files, which are made of different materials. We will rename each file, in every case, as "cal" before trying to run it. The ways to attempt execution are:
- call it as itself, namely, execute "cal" at the shell
prompt
- provide it as a shell argument, that is, exeucte "bash cal" at
the shell prompt
- get try-to-exec-v1 to try to run it, that is, execute
"try-to-exec-v1" at the shell prompt
- get try-to-exec-v2 to try to run it, that is, execute
"try-to-exec-v2" at the shell prompt
and the five "different materials" of which might consist a program that we present under the "cal" name are:
- machine code
- shell code, with a shebang naming the shell
- shell code, with no shebang
- perl code, with a shebang naming perl
- perl code, with no shebang
The 20 combinations of those 4 with these 5 are represented in the table in try-to-exec-worksheet.pdf (included in the file you unzipped earlier). Print it out or obtain it as a handout from the instructor. In each box, put a check mark or X depending whether the type of program for the box's column ran successfully or not using the method for the box's row. Do it column by column, first copying that column's program under the name "cal" into the /exectests directory. Also make any notes in the box if you like, telling what error message may have appeared for example.
Then, analytically try in each case to construct the path of execution that led to each success or failure. How far did it get along the line of execution handoffs before reaching its ultimate fate? When it succeeded or failed, in whose hands was it? exec( )? bash? sh? perl?
------------
For reference, related section of the bash man page followed by that of the tcsh
man page:
bash man page:
COMMAND EXECUTION
After a command has been split into words, if it results in a simple
command and an optional list of arguments, the following actions are
taken.
If the command name contains no slashes, the shell attempts to
locate it. If there exists a shell function by that name, that
function is invoked as described above in FUNCTIONS. If the name
does not match a function, the shell searches for it in the list of
shell builtins. If a match is found, that builtin is invoked.
If the name is neither a shell function nor a builtin, and contains
no slashes, bash searches each element of the PATH for a directory
containing an executable file by that name. Bash uses a hash table
to remember the full pathnames of executable files (see hash under
SHELL BUILTIN COMMANDS below). A full search of the directories in
PATH is performed only if the command is not found in the hash ta-
ble. If the search is unsuccessful, the shell searches for a
defined shell function named command_not_found_handle. If that
function exists, it is invoked with the original command and the
original command's arguments as its arguments, and the function's
exit status becomes the exit status of the shell. If that function
is not defined, the shell prints an error message and returns an
exit status of 127.
If the search is successful, or if the command name contains one or
more slashes, the shell executes the named program in a separate
execution environment. Argument 0 is set to the name given, and the
remaining arguments to the command are set to the arguments given,
if any.
If this execution fails because the file is not in executable for-
mat, and the file is not a directory, it is assumed to be a shell
script, a file containing shell commands. A subshell is spawned to
execute it. This subshell reinitializes itself, so that the effect
is as if a new shell had been invoked to handle the script, with the
exception that the locations of commands remembered by the parent
(see hash below under SHELL BUILTIN COMMANDS) are retained by the
child.
If the program is a file beginning with #!, the remainder of the
first line specifies an interpreter for the program. The shell exe-
cutes the specified interpreter on operating systems that do not
handle this executable format themselves. The arguments to the
interpreter consist of a single optional argument following the
interpreter name on the first line of the program, followed by the
name of the program, followed by the command arguments, if any.
tcsh man page:
Builtin and non-builtin command execution
Builtin commands are executed within the shell. If any component of
a pipeline except the last is a builtin command, the pipeline is
executed in a subshell.
Parenthesized commands are always executed in a subshell.
(cd; pwd); pwd
thus prints the home directory, leaving you where you were (printing
this after the home directory), while
cd; pwd
leaves you in the home directory. Parenthesized commands are most
often used to prevent cd from affecting the current shell.
When a command to be executed is found not to be a builtin command
the shell attempts to execute the command via execve(2). Each word
in the variable path names a directory in which the shell will look
for the command. If the shell is not given a -f option, the shell
hashes the names in these directories into an internal table so that
it will try an execve(2) in only a directory where there is a possi-
bility that the command resides there. This greatly speeds command
location when a large number of directories are present in the
search path. This hashing mechanism is not used:
1. If hashing is turned explicitly off via unhash.
2. If the shell was given a -f argument.
3. For each directory component of path which does not begin with a
`/'.
4. If the command contains a `/'.
In the above four cases the shell concatenates each component of the
path vector with the given command name to form a path name of a
file which it then attempts to execute it. If execution is success-
ful, the search stops.
If the file has execute permissions but is not an executable to the
system (i.e., it is neither an executable binary nor a script that
specifies its interpreter), then it is assumed to be a file contain-
ing shell commands and a new shell is spawned to read it. The shell
special alias may be set to specify an interpreter other than the
shell itself.
On systems which do not understand the `#!' script interpreter con-
vention the shell may be compiled to emulate it; see the version
shell variable. If so, the shell checks the first line of the file
to see if it is of the form `#!interpreter arg ...'. If it is, the
shell starts interpreter with the given args and feeds the file to
it on standard input.