Categorize states

In this exercise you will write a script that traverses a text file. That is, reads in from the file one line at a time in a loop, and optionally does something with that line.

The file is states.csv. It's a comma-separated file with one line for each of the 50 states in the USA. Each line contains a state's abbreviation, name, population, and capital. The first few lines look like this for example:

AL,Alabama,3614000,Montgomery
AK,Alaska,352000,Juneau
AZ,Arizona,2224000,Tuscon
AR,Arkansas,2117000,Little Rock
CA,California,21185000,Sacramento
CO,Colorado,2534000,Denver
etc.

In the loop the script will count the number of tokens in a state's name and that of its capital, together, and group it with other such states accordingly. That is, some states have 2-token names such as Rhode Island, some have 2-token capitals such as Baton Rouge, some have both, some have neither. Name and capital combined, a state has either 2, 3, or 4 tokens to that name pair. The script will determine which it is for each given state, and gather that state with other such states into an array of those states' names. While doing so it will maintain a count of how many such states there are. After processing all 50 states it will output a report that looks like the one shown below as "Target report model."

Write your script similarly to how I wrote mine. In essentials, here are some details sketching the pseudo-code for it. Use arrays named

twos
threes
fours

for, respectively, the states with two tokens in their name pair (like California), or three tokens (like Missouri), or four (like New Mexico). And for counting the number of states that there are in each category, variables named

nTwos
nThrees
nFours

with initial values 0. At the top of your script you could declare the arrays, and assign 0 to the variables. But you don't have to because of the lax dynamic way the shell handles variables. They will be created upon first reference.

Run a loop traversing states.csv. It will read one state record at a time and process it. Processing consists of deriving and assigning to variables the name of the state, the name of its capital, and the number of tokens those contain collectively as a pair. Then code a 3-way branch if/elif/fi command with a branch for each possible number of tokens (2, 3, or 4). In the branch, increment the appropriate count variable and append to the appropriate array. Here is how you can append an element to an array:

[root@instructor states]#
[root@instructor states]# directions=(north south east)
[root@instructor states]# echo ${directions[*]}
north south east
[root@instructor states]#
[root@instructor states]#
[root@instructor states]# directions+=(west)
[root@instructor states]# echo ${directions[*]}
north south east west
[root@instructor states]#

Now it's time to report. Print a header line as seen in the target report model,  like "Two words are contained in..." Then run a loop for as many iterations as there are states in the category being reported. In the loop print one array element (i.e., state) per iteration. Then move to, and loop over, the next category.

To turn in:

Upload your categorize-states.sh script file. To grade it, I will run it against states.csv, observe that it produces the correct report, and glance at the source code to verify it did it in the prescribed way.


Target report model:

[root@instructor states]# ./categorize-states.sh
Two words are contained in the state-plus-capital name pair for these 32 states:

Alabama
Alaska
Arizona
California
Colorado
Connecticut
Delaware
etc.
.
.
.

Three words are contained in the state-plus-capital name pair for these 16 states:

Arkansas
Iowa
Louisiana
Minnesota
etc.
.
.
.


Four words are contained in the state-plus-capital name pair for these 2 states:

New Mexico
etc.
.
.
.

[root@instructor states]#