Functions and File Traversal

In this exercise you will write a script that traverses a text file. That is, reads in from the file one line at a time in a loop, and optionally does something with that line. You will also write functions to do the "something."


File traversal

Here is how you read through a file by line, one line at a time:

while read LINE
do
    :
done  <     <name of your text file here>

It bears a little explanation. "read LINE" serves two purposes: it pulls the next line from the file and assigns it to variable LINE (arbitrarily named), plus it serves as the loop condition for while, per its exit status. A command's exit status is an integer from 0 though 255. A while loop considers 0 to be "true" and nonzero "false. read produces an exit status of 0 whenever it pulls the next line from the file and gets it. It produces a nonzero when it pulls the next line but doesn't get anything, which happens after it has reached the end of the file.

Here, the colon is needed as a sort of no-op. The loop will error if it contains nothing.

For practice, create a text file with a few lines in it. Just type something, three or four lines. Then also create a script file containing the code above, filling in whatever name you gave your text file (without the surrounding brackets). Give your script file executable permissions. Run it. If you get no particular error messages you succeeded. You won't see any output because the colon, no operation, null action, nada de nada, is what you have written your program to do. Congratulations! Change it by replacing the colon to print (echo) the contents of the LINE variable ($LINE) at each iteration. Run again. Get it?

Note the way in which the file is made available to the loop-- by an indirection operator directed to its "done" particle. That's a little odd, as we are more familiar with indirection from a file to a simple command, not to a loop. For example try:

sort      <        <name of your file here>

or

tr  e  x  <        <name of your file here>

The first sorts your lines alphabetically and outputs them, the second replaces all their e's (assuming they contain some of those) with x's and outputs them. A subtle point to appreciate is that In neither case did the command receive the file from its command line. In both, it received the file's content from its standard input. That's what the indirection operator (i.e., the less-than sign < ) does. Some commands can receive data both this way, spoon fed through the feeding tube of their standard input, or alternatively by giving the name of some file on the command line where the data in question for the command to work with has been pre-positioned inside the file. (Which way does the command have to be coded to do more work?) sort is such a command; tr is not. Try:

sort               <name of your file here>

or

tr  e  x           <name of your file here>

(Note the absence of any indirection operator here.)


To do:

In place of your file I give you states.csv. It's a comma-separated file with one line for each of the 50 states in the USA. Each line contains a state's abbreviation, name, population, and capital. The first few lines look like this for example:

AL,Alabama,3614000,Montgomery
AK,Alaska,352000,Juneau
AZ,Arizona,2224000,Tuscon
AR,Arkansas,2117000,Little Rock
CA,California,21185000,Sacramento
CO,Colorado,2534000,Denver
etc.

In a script file named filetraversal.sh write a loop that cycles through my file, just like you did already for yours (simply change the filename). However I want the loop's internal operation to differ. I want it to extract the names of both the state and its capital and print out, instead of the raw line itself, processed lines like this:

The capital of the great state of Alabama is Montgomery.
The capital of the great state of Alaska is Juneau.
The capital of the great state of Arizona is Tuscon.
The capital of the great state of Arkansas is Little Rock.
The capital of the great state of California is Sacramento.
The capital of the great state of Colorado is Denver.
etc.

In order to pull out the state or capital name in particular when given a line from the file, you are to write functions. Refer to the slides about them in the "interactive bash" slide presentation. Write four functions, one for extracting each of the fields in the files' lines. Name your functions "abbreviation" "name" "population" and "capital". In calling any of these functions, supply it with the line from which it is supposed to extract (by putting that line on the function call line, as an argument). Within the function, have it obtain the line that was passed to it using $* (whole argument list), and have the extraction performed by the cut command (conveniently, this is comma-separated data). Supply $* to cut (how?) and let it cut away. Once your functions are in place your main loop, when it wants for example the capital of a state, will supply the state's line to the capital function, and embed that in command substitution syntax(   $(...)   ) within the line of code that produces the corresponding line of output, i.e.,

echo "The capital of the great state of ---- is ----."


To turn in:

Transfer your filetraversal.sh script file to your assignments directory. To grade it, I will supply my own file named states.csv and observe that your script produces the right output from it.