exercising the tftp protocol

Trivial file transfer protocol, as its name suggests, is a minimalistic protocol for file transfer. It is an application layer protocol, and employs UDP as its underlying transport protocol. It is not the protocol of choice for transferring files. The reasons for studying it are 1) it is simple enough that you can look at a capture of it and know what your looking at, it serves as a good sample protocol representing them all, and 2) it has some features that presage those we will see in TCP and other more complicated protocols.

The assignment for you to perform (red - to be executed on server blue - to be executed on client)

Operate on 2 machines in a LAN. The VirtualBox client and server machine pair from the "sniffing" exercise is suitable. Log in to the server as student, launch the graphical desktop ("startx" command), from which launch a terminal window (icon under "Activities" menu), in which become root ("sudo su -" command). Log in to the client as root; the gui is unnecessary on this machine.

One machine will run the tftp server, the other the tftp client. If not installed already, install those 2 pieces of software on both machines. Then either machine can play either role. (The dnf command installs software from remote mirrors, so to use it your VMs would need internet access. Help getting it, if needed, can be found here.)

dnf install tftp

dnf install tftp-server

To run the tftp server:

systemctl start tftp

netstat will show all the active udp ports (-u).

netstat -panu

Port 69 should appear, being used by xinetd. 69 is the well-known (i.e., conventionally standard) udp port number utilized by tftp.

Identify the machines' IP addresses. We will transfer files to the client from the server, while running Wireshark on the server. tftp server uses a particular directory, /var/lib/tftpboot by default, to house files it is willing to transfer to a requesting client. So, let's populate it with some files. We want some that are pure ascii, others that are binary. And we want them of assorted sizes. On the server please:

cd /var/lib/tftpboot

When moving a file, tftp transfers 512 bytes at a time. A tftp receiver knows the transfer is over when it gets any amount of data less than that in an incoming tftp data message. Let's create 3 files containing ascii characters only. Their sizes will be 500, 1024, and 1025 bytes. Let's also create a set of files with the same sizes, but containing binary data (that is, anything). Here's a slick way to make the ascii files:

/dev/urandom serves as a bottomless source of binary charaters,
A-Za-z0-9 means the set of all characters that are letters or numerals
tr is the translate command
-c means "the complement of," in this case -c A-Za-z0-9 means all characters that are not letters or numerals
-d means, delete the specified characters from tr's input
tr's output will contain whatever characters remain namely those that are letter or numerals, purely
that goes to head as input
head -c takes the specified number of characters from the input
that goes to tee as input
tee puts one copy on screen, another in a specified file

And the binary files:

head -c 500 < /dev/urandom > 500B.bin
head -c 1024 < /dev/urandom > 1024B.bin
head -c 1025 < /dev/urandom > 1025B.bin
head -c $[10*1024*1024] < /dev/urandom > 10MB.bin

Finally, since the above ascii files don't necessarily contain lines (that is, regularly occurring embedded linefeed or 0x0a characters), let's make a file that does:

for i in {1..3};do echo "line $i";done | tee 3lines.txt

Ask what kind of files these are and examine their sizes:

file *
ls -l *

Note the difference in content between the two types of files we have created. Individual bytes in the binary ones can have any value among the 256 possible ones. Those in the ascii files are value-restricted. They can only assume 62 values-- the ones representing the 26 uppercase letters, plus the 26 lowercase ones, plus the 10 numerals-- and none of the other 194. For example the ascii files contain no control characters, which lie outside ascii's spectrum.

Now we have something for tftp to transfer.

On the server --
- run Wireshark.

On the client --
- run these 2 commands in succession:

tftp <IP of server> -v -c get 500B.asc
tftp <IP of server> -v -c get 500B.bin

On the client --
- note the messages due to tftp's "-v" verbose option. In particular, note how many bytes it says were transferred.

On the server --
-stop the Wireshark capture. Apply Wireshark's display filter "tftp" to mask away any non-tftp frames your capture might have picked up. (If you had wanted to avoid capturing them in the first place, applying a capture filter for tftp then instead of a display filter for it now, you could not have done so. Why?)
- save the file under the name 500asc-and-bin.cap, opting to save only the "Displayed" packets (to convey the display filter's effect into the saved file). To do this you may need to use the File menu's "Export specified packets" rather than "Save" suboption.

Similarly, let's get saved capture files for all the other transfers. In each case you might catch non-tftp network background noise in your net. You don't care about it. Use the above filtering technique to exclude it from your capture file, saving into it only the tftp exchanges that you do care about.

tftp <IP of server> -v -c get 1024B.asc < Wireshark while doing it, and save as above to 1024Basc.cap
tftp <IP of server> -v -c get 1024B.bin < ...and save to 1024Bbin.cap

tftp <IP of server> -v -c get 1025B.asc < ...and save to 1025Basc.cap
tftp <IP of server> -v -c get 1025B.bin < ...and save to 1025Bbin.cap

tftp <IP of server> -v -c get 10MB.asc < ...and save to 10MBasc.cap
tftp <IP of server> -v -c get 10MB.bin < ...and save to 1024Bbin.cap

tftp <IP of server> -v -c get 3lines.txt < ...and save to 3linestxt.cap

In 500asc-and-bin.cap you should see 6 frames, 3 for the transfer of each file (500B.asc followed up by 500B.bin). The first frame in the transfer of each (capture file's 1st and 4th) is a Read Request, naming the file whose transfer is requested. The transfer of the files begins, in each case, in the second frame for that file (capture file's 2nd and 5th). That's also where it ends, because the whole file fits into the one frame. tftp knows that's all there is because there are fewer than 512 bytes in the frame. For a larger file, the first frame would have contained its first 512. If it contains less than 512 then this file doesn't have 512 and what's in the frame is the whole thing.

Notice also the amount of data transferred (not gross but net-- highlight data in the tftp message). In the case of both files, the file size is 500 so that's the amount of data we expect to see in the tftp message. In the case of the ascii file, that's what we do see. In the case of the binary one you may well see a few bytes more. Maybe 503 or 506 or something, instead of 500. (If not, repeat the command on the server to produce a different 500B.bin, and trace its transfer again.) Does that mean the transferred file on disk is bigger than the original (therefore, wrong)? Go see, on the client:

ls -l 500B.*

No, it's 500. Looks like tftp got it right. To find out for sure, hash both files:

md5sum 500B.bin (on server)
md5sum 500B.bin (on client)

and observe that the copy and the original have the same hashes, so they are the same. Now try again, capturing in Wireshark on the server and transferring 500B.bin in octet/binary/raw mode with this command on the client:

tftp <IP of server> -m octet -c get 500B.bin

Wireshark will show in the tftp Read Request that the transfer mode is "octet" and the size of the data being transferred will be exactly 500 bytes in this mode. Whereas, in netascii mode it was a little bigger on the wire. While again we see the dichotomy between ascii and binary, please distinguish between file content and transfer method. Above we controlled what we put into the files we made, ascii in some and binary in others. Here tftp is making one assumption or the other about what kind of data it is handling, independent of what kind it actually is. What might be the reason tftp transferred more bytes across the wire than there actually were in the original file? and where did these extra bytes disappear to at the destination before the copy was written?

Accommodating lines

Both 1024B.asc and 1024B.bin are 1024 bytes in size. That's exactly 2 blocks. So a tftp transfer should have 3 data block frames of sizes 512+512+0. Open 1024Basc.cap in Wireshark and verify this. Show it in a Wireshark flow graph (under "Statistics" menu). Why does tftp bother sending the empty 0-size data block? Read the explanation in section 6 "Normal Termination" of the tftp rfc. Open 1024Bbin.cap in Wireshark. Again there are 3 data block frames but the last one is probably not empty. My data packets contain 512+512+13. That's more bytes transmitted than are in the file. Transfer 1024B.bin again, but explicitly specify octet mode for the transfer (instead of the default netascii mode). Run Wireshark while doing it. The command on the client would be:

tftp <IP of server> -m octet -v -c get 1024B.bin

What's going on? Can there be a hex 0A character in your asc files? Can there be one in your bin files? What does tftp do when it finds a hex 0A to transfer, if it's transferring in netascii mode? What does it do if transferring in octet mode? What if it doesn't ever encounter any 0A in the first place. What, in terms of the number of bytes transmitted versus the number in the file, are the implications in each case.

Port numbers

Open any of the capture files you saved. Look at the port numbers used, from the UDP headers. Compare the numbers used in the first frame versus the second and third. What happened? Read the explanation in section 4 "Initial Connection Protocol" of the tftp rfc.

Performance effect of tftp's "stop-and-wait"

tftp is an example of stop-and-wait protocols. That means a protocol with an acknowledgement feature in which one machine has a number of things to send to another, and chooses whenever it sends one to await that one's acknowledgement before sending the next. It withholds the next till its predecessor's acknowledgement comes in the door from the other side.

But while the protocol is waiting the wire isn't. The wire stands ready to carry the next frame if the protocol wants, but the protocol holds it back. There's nothing "wrong" with that, it's done as a matter of deliberate design choice. In effect this "wastes" wire capacity. But for the good cause that by insisting on one-by-one acknowledgement as-we-go the acknowledgement algorithm is simplified. The sender insists on doing current accounting, never advancing "acknowledgement credit" to the receiver. If you give credit you then have to collect; not giving it means having no follow-up job to do. Simpler. Other protocols with an acknowledgement feature aren't always so fussy, but they are always more complicated. For its main purpose of pre-boot operating system loading over a network, just-get-the-job-done simplicity is the greater virtue.

Let's observe the speed penalty. First we will try to get some approximation of the raw capacity of the wire using iperf. Then we will transfer one of our large 10MB files first with tftp, then with scp. For scp, you need to know an account name on the server and its password. We will transfer the file with tftp then with scp, timing both transfers.

Do this part only if operating on physical machines

dnf install iperf (if necessary)

On one machine, that ill act as server:

iperf -s -u -i 1

-s act as server (default port 5001)
-u use udp
-i 1 print bandwidth reports at 1-second intervals

On the other machine, acting as client:

iperf -c <IP of server> -u -i 1 -b 10M

-c act as client (connect to 5001)
-u use udp
-i 1 print bandwidth reports at 1-second intervals
-b 10M set target bandwidth to 10Mbps

The client will probably report a bottom-line bandwidth of 10Mbps. Repeat, raising the target bandwidth in iperf's -b option and observe that higher bandwidths are reported/attained. Keep increasing the target bandwidth until the reported bandwidth at the end of the run stops rising and gets stuck at some value. If you are on a so-called "fast ethernet" LAN, rated at 100Mbps, iperf will get stuck somewhere below but near that. (If you connected your machines directly with a cable, bypassing the switch you probably use, the bandwidth might reach around 1000Mbps because you may have a couple of gigabit-rated NICs. No chain is stronger than its weakest link and you may have a 100Mbps switch, so that's as fast as the data can get through even if the NICs are faster.) At 100Mbps a 10MB file, which consists of 80Mbit, could be expected to take 0.8 sec to go through (80Mbit / 100Mbit/sec). That would be, very approximately, the wire-imposed limit.

Now let's see how long a couple of protocols take to move a 10MB file. First, let tftp do it. On the client:

time tftp <IP of server> -c get 10MB.asc

Note the times reported. "real" time tells you how much actual elapsed time passed. "user" and "sys" tell you how long the cpu in your machine was devoting itself to your command. On my system, where iperf got udp to attain about 95Mbps, tftp took 11-12 seconds elapsed of which the cpu was busy on tftp for a little under a second.

Now do the transfer again, using scp this time. Note that, because scp makes you manually give a password before it starts transferring, which eats up a lot of time while the timer is running, the deck is stacked against scp. So run the following command, but be quick with the password when prompted:

time scp <user>@<IP of server>:/var/lib/tftpboot/10MB.asc .

You should be surprised to see that, even though handicapped by the passwording requirement, scp won. You should also note that the time the system devoted to scp was much less than it did to tftp. On my system I typed the password as fast as I could and got elapsed time results of 2-3 seconds. Of that, I'm the culprit who cost about a half a second. The cpu (user and sys components of time's report) spent about a second on the job. scp achieved fuller line utilization because it did not hesitate, as did tftp, to employ the wire. It pushed frames onto it more aggressively.

Neither tftp nor scp acted alone. tftp relied on udp while scp partnered with tcp. The acknowledgement handling discipline in the first case resides within the application layer's tftp. Its transport layer partner udp does no acknowledging. In the other case it resides in tcp, which very much engages in an acknowledgement scheme while scp does not. scp is like most application protocols that ride on tcp-- they abstain from deciding when and how to send their data, passing it all to tcp who then makes those decisions. Our other case is just the reverse, tftp did and udp did not make the decisions. So the "competition" is between tftp's and tcp's approaches to acknowledgement. tftp's is stupid and tcp's is smart, particularly in terms of line utilization. tcp employs a "sliding window" enabling it to race ahead emitting packets even when those already emitted have never yet got acknowledged. Opposite of stop-and-wait. tcp is not limitless. It does have other features to regulate how far ahead it should race and rein it in if it goes too far. But it tries to press hard, exploiting the wire's carrying capacity as fully as possible by feeding it more and more data till the wire starts to show signs of breakdown. Then tcp slows down to give the wire a break but, once the wire seems recovered, tcp ratchets up on it again.

Our result, in my environment, is that data that could be transferred in 0.8 seconds as limited by the wire takes 2 or 3 seconds as limited by tcp and 10 or 11 as limited by tftp. Any other stop-and-wait protocol would tend to resemble tftp's behavior.

See

http://www.mathcs.emory.edu/~cheung/Courses/455/Syllabus/3-datalink/stop-and-wait-anal.html

from Shun Yan Cheung of Emory University for a quantitative (and well presented) discussion of stop-and-wait performance. His conclusion, "Stop-and-Wait has good channel utilization for low speed links. It is very inefficient on high speed links."

What to turn in:

- written-answers.txt - a text file with your short written answers to the following three questions.

1. As posed in the the above text: "If you had wanted to avoid capturing [non-tftp traffic] in the first place, applying a capture filter for tftp then instead of a display filter for it now, you could not have done so. Why?" (It's about ports.)

2. As posed in the above text: "What might be the reason tftp transferred more bytes across the wire than there actually were in the original file? and where did these extra bytes disappear to at the destination before the copy was written?" Consider the distinction between my types of file content (asc vs bin) and tftp's types of transfer method (netascii vs octet). The slides touch on this question.

3. Maximum channel utilization depends on propagation distance. Is that dependence direct or inverse?

My related thinking:

Shun Yan Cheung explores the performance of stop-and-wait protocols in terms of its "channel utilitzation." What does the term mean?

He says in relation to this diagram:

that maximum channel utilization is the "fraction of time on the time line in the above figure [that is] yellow."

Another author, Jim Kurose, similarly suggests "we define the utilization of the [channel] as the fraction of time the sender is actually busy sending bits into the channel."

So it sounds like channel utilization isn't a characteristic of the channel itself, but of the mechanism that fills it. Like a modem. The characteristic belongs to whatever device or technology is the interface to or into the channel, be that an ethernet cable, analog phone wire, or coaxial cable. Channels for this discussion are wires. They have a certain intrinsic conductive behavior in moving electrical signaling along their length, independent of whatever attachments feed signaling into them. A copper wire carries voltage applied at one end over to the other at about 2/3 the speed of light. It doesn't matter how or what applied the voltage. Once it's there, off it goes to permeate the wire at that same speed regardless whence it came.

We hear terminology "propagation time" and "transmission time." Propagation time is about the speed of the signal within and along the wire once the signal gets into it. Since the speed is fixed (2/3 lightspeed) propagation time depends on distance. Your computer might communicate with another across the room, across town, the country, or the globe. Propagation time is in proportion to that. Transmission time is about how long it takes to apply a signal to the wire in the first place. For the signal representing a bit, a 56kpbs modem takes 1/56000 seconds to get the job done and set that bit in flight, before turning to the next bit. Since that rate is fixed, transmission time depends on data quantity. It takes 8 times as long to inject a byte as a bit. Your computer might send a frame of 60 bytes (480 bits) or 1514 bytes (12112 bits). Transmission time is in proportion to that.

A 6-lane freeway might have a 1-lane on-ramp. Propagation time depends how far a car travels, once on the road, at the freeway's fixed car speed of 75mph. Transmission time is about how long it takes to get a car onto the road, perhaps determined by the ramp's 10-car-per-minute traffic light-- namely 6 seconds.. Propagation time is a function of the road not the ramp. Transmission time is a function of the ramp not the road.

Channel utilization isn't about signal speed or data quantity, but about time ratios. Maybe it's a sixth, or a half, or five-sixths. It's whatever fraction of the whole is yellow.

Other things equal, if transmission takes more time (yellow elongates) utilization rises. If travel takes more (white elongates), utilization falls.