backup exercise
Make a note of the IP address or domain name for the backup server given by
the instructor.
It is assumed that you have previously prepared your machine and the server.
Your machine has a user account and the server has one, between which you have
set up key-based authentication. That is, your local user produced an ssh key
and lodged it with the counterpart user on the remote machine. (The two users
are counterparts by this arrangement but don't need to have the same names.)
That way, when logged in "here" as the local user, you can cause
operations "there" as the remote one, with auto/unattended/passwordless
authentication.
Create working data, to be backed up, on local machine
Login, as the local user "student", assuming that is the local one as described above.
To give us something to work with, let's produce a small file hierarchy. We will then perform practice backups with it. Obtain it in the form of a tar file as follows.
cd
scp public@unexgate.dmorgan.us:/home/public/classifications.tar . (don't forget the dot)
Give the password when asked (the instructor has supplied the password or will). The classifications.tar file flows down onto your disk in the current, home directory.
In your home directory create a directory named "taxonomy," then
unfurl the contents of classifications.tar into it, as follows:
cd
mkdir taxonomy
cd taxonomy
tar -xvf ../classifications.tar
tree
The taxonomy directory now has subdirectories named animal, vegetable, and
mineral with some content in each. See a graphical
view here.
Backing up data using tar, over an unencrypted session using netcat (nc)
We want to backup the taxonomy directory into the home directory of your
remote counterpart user. With the command sequence below, you will
1 - pack up your data ("tar -c" command itself does that)
2 - hand it over to nc (vertical-bar pipe operator)
3 - let nc hand it over to the remote nc (nc does that upon seeing the remote IP
address in the command line)
4 - let the remote nc give it to the remote tar (the remote command sequence
specifies a pipe from nc to tar)
5 - let remote tar unpack it onto the remote disk ("tar -x"
over there does that)
First we need to get a server program running on the server/remote machine. For that purpose, get logged in to a shell on that machine:
ssh studentXX@<remote IP>
There, set nc listening as a server process. Make it listen to the port number 5000 plus your particular student number XX. So for example if you are assigned to work as the server's student 20, use port 5020. Have nc hand off whatever it may get to an unpacking tar. (That's in anticipation of its getting the stuff we intend to send it, which will come from from a packing tar):
nc -l <remote port> | tar -xf -
Note the hyphen in the command line for tar. It is important. What does it signify when unpacking ( -x )? Check tar man page, -f option.
Now we have to operate on the client machine. Leave open your session on the server machine. Gain a second local virtual terminal by pressing key combination ctrl-alt-F2. Log in again to your machine as "student". Below, you run a packing tar and have it hand off its output to an nc. This nc runs as a client, and will be told to send whatever it gets to the server tar we have running on the remote machine.
cd
tar -cf - taxonomy | nc <remote IP> <remote port>
Note the hyphen in the command line for tar. It is important. What does it signify when packing ( -c )? Now that you're done sending the backup data you can collapse this virtual terminal and go back to the original one.
exit
Press key combination ctrl-alt-F1. You are again facing the server on your screen. Verify that the transfer worked. Use the ls command to view that a "taxonomy" subdirectory exists and its contents appear the same as the taxonomy directory on your machine, which you transferred. The "tree" command might be useful (use it on the taxonomy directory to see its internal content and structure). We're almost done, so remove the remote (not local!) taxonomy directory:
cd
rm ./taxonomy/ -rf
ls
exit
then observe that it's gone, and finally get back to your local prompt by "exit"ing from the remote one.
Backing up data using tar, over an encrypted session using ssh
Now we will repeat much the same thing, but replacing nc with ssh as the "go-between carrier" shuttling the data between computers. The vaule-add provided by ssh is datastream encryption. Whatever you remotely backed up with netcat traveled unencrypted. If it traversed a network you don't control, you made its content available to others. If you encrypt, that won't happen.
We want to backup the taxonomy directory into the home directory of your
remote counterpart user. With the command sequence below, you will
1 - pack up your data ("tar -c" command itself does that)
2 - hand it over to ssh (vertical-bar pipe operator)
3 - let
ssh hand it over to the remote sshd (ssh does that upon seeing the remote IP
address in the command line)
4 - let the remote sshd give it to the remote tar (the command sequence
specifies a to-be-run remote command grouping, which includes tar)
5 - let remote tar unpack it onto the remote disk ("tar -x"
over there does that)
cd
tar -czf - taxonomy | ssh student20@<remote IP> "(cd; tar -xzf -)"
Note the presence of hyphens in the command line for both tar's. They are important. What do they signify (check tar man page, -f option)? Now verify the transfer. Log into the target machine:
ssh studentXX@<remote IP>
Once you have the prompt on the remote machine, you'll find you're in the home directory of studentXX. Use the ls command to view that a "taxonomy" subdirectory exists there and its contents appear the same as those of the taxonomy directory on your machine, which you transferred. The "tree" command might be useful (use it on the taxonomy directory to see its internal content and structure). We're almost done, so remove the remote (not local!) taxonomy directory:
cd
rm ./taxonomy/ -rf
ls
exit
then observe that it's gone, and finally get back to your local prompt by "exit"ing from the remote one.
Note that we had to compose the operation from two separate local commands, tar and ssh, joining them with a pipe ( | ). Below we perform much the same operation using rsync, but rsync lets you use ssh as a built-in option so we do it all with a single command. Also, tar does a wholesale, non-incremental transfer of everything every time. rsync does an incremental transfer of only what needs to be transferred. That's everything the first time, but subsequently it's only what has changed.
Backing up data using rsync, over an encrypted session using ssh
We want to backup your taxonomy directory into the home directory of your remote counterpart user. Run rsync as follows:
cd
rsync -v -a --delete -e ssh taxonomy
studentXX@<remote IP>:/home/studentXX
Now verify the transfer. Log into the target machine:
ssh studentXX@<remote IP>
Once you have the prompt on the remote machine, you'll find you're in the home directory of studentXX. Use the ls command to view that a "taxonomy" subdirectory exists there and its contents appear the same as those of the directory on your machine that you transferred, taxonomy. Get back to your local prompt by "exit"ing from the remote one. Then make local changes as follows:
cd
cd taxonomy/vegetable
ls
echo tulip > tulip
rm rose
ls
Note you have replaced the file "rose" with a file "tulip" (whose content is the word tulip). Repeat the rsync backup:
cd
rsync -v -a --delete -e ssh taxonomy
studentXX@<server>:/home/studentXX
And again, log into the other, target machine to see what happened:
ssh studentXX@<server>
There, verify that 1) /home/studentXX/taxonomy/vegetable/tulip has appeared, and 2) /home/studentXX/taxonomy/vegetable/rose has disappeared. We're done, so remove the remote (not local!) taxonomy directory, observe that it's gone, then exit back to the local machine:
cd
rm ./taxonomy/ -rf
ls
exit
rsync operates incrementally. It figures out what files have changed then
amends the target, with appropriate copy and delete operations, narrowly
changing only those. If the volume of taxonomy's content were large enough, you
would observe that the original operation to transfer everything was lengthy,
while the secondary one to apply just changes was quick. With tar, the operation
would be equally lengthy on both occasions.
Using rsync to make multiple "snapshot style" backups
First we'll use rsync to make a local backup copy of your home directory. Then you'll extend it to making a backup copy of your home directory to a centralized directory on the remote machine, where many users' home directory backups might be consolidated. The difference between this and the exercise just completed is that in that one you have home directories on both machines, and backed up a select part of one home directory (taxonomy) into the other. Here, you only have one home directory, what you back up is that directory as a whole, and the target location is a quasi-public place (/backup) where others direct their home directory backups too. Beyond that, the main difference is that you'll make "snapshot style" backups that capture the evolution of your home directory's content, and do it in an efficient way utilizing hard links.
Login on your local machine as root. You will make a dedicated directory for holding home directory backups. You will create a group to which you can add select users, and will set up the backup directory so it's available to just the folks in that group.
mkdir /backup
chmod 770 /backup
groupadd backersup
chgrp backersup /backup/
gpasswd -a student backersup
Now log in as student. (You could log out and then back in, or log in on a different virtual terminal. Just make sure you execute a fresh login; till then, your new membership in the backersup group will not be recognized.) You will create a dedicated directory for receiveing your backups, and make it your own by setting permissions appropriately.
mkdir /backup/student
chmod 700 /backup/student
Now to do some actual backing up! A script that incorporates rsync and provides the "multiple" and "snapshot" features is provided. It's named "backup-snapshot." Get it, and another diagnostic script named "shownodes-for-corn," from the server. Probably the command you'll use are:
cd
scp public@unexgate.dmorgan.us:/home/public/backup-snapshot . (don't forget the dot)
scp public@unexgate.dmorgan.us:/home/public/shownodes-for-corn . (don't forget the dot)
Now make these executable and run a backup.
chmod +x backup-snapshot shownodes-for-corn
./backup-snapshot
What did you do? Well, take a look at what's in the script. And, the acid test, examine your results directly:
ls -l /backup/student
tree -a /backup/student | less
./shownodes-for-corn
Run backup-snapshot and examine as above a second time, then a third. On each occasion, how many distinct (physically different) copies of file corn (i.e., its data) exist on the disk? How many nominal copies (appearances of a name for it in directory listings)? You made these backups on a non-volatile fileset. Let's do it again but first make one deletion, one addition, and one change to the original fileset being backed up. That's the one in taxonomy under your home directory.
cd
rm taxonomy/vegetable/tulip
echo iron > taxonomy/mineral/iron
echo maize > taxonomy/vegetable/corn
Run another backup, then satisfy yourself that tulip is gone, iron appears,
and corn is changed in the "0" backup but not the "1" or
"2" versions. Run shownodes-for-corn, which will show you how many distinct copies
of corn's contents exist. Once more change corn (in any way), run a backup, and
again check the number of distinct copies. Then run 2 more backups without altering
corn, and observe the change in the number of distinct copies.
Challenge project: make an adapted, trans-network version
This takes some thought and effort, and possibly more shell script fluency than students necessarily have in this class. That makes it a worthy challenge! You've got it working locally. What changes to your code will make it work across the network? In /backup on the server, I've provided pre-established directories that look like:
drwx------ 2 student10 student10 4096 Nov 16 21:31 student10
for all the studentXX accounts. One of them is yours. You want your local home directory to get backed up into that remote directory, just as it's now getting backed up into the local /backup/student directory. The code (backup-snapshot script) doing the current, strictly local job, is:
cd /backup/$USER
rm -rf backup.2
mv -f backup.1 backup.2
cp -al backup.0 backup.1
rsync -v -a --delete $HOME/ backup.0/
What will it take to adapt it? I leave this project up to you. Here is some useful information.
You want to transfer the operation of these commands over to the other machine. But the commands themselves and the algorithm they represent do the job, so should be essentially preserved. You can get rsync to operate on the other machine by building a couple things into the rsync command line directly. First, the specification of ssh as was done in the earlier rsync example above. Second, augment the spec for the target directory to incorporate target user and machine identifiers. Model it as above (in "Backing up data using rsync, over an encrypted session using ssh").
For the other 4 commands, how to make them run on the remote machine? Well, that's one of the general purposes of ssh. You can have ssh run an arbitrary command on the remote machine (subject to permissions and things) with syntax like:
ssh user@remotemachine "command"
So you could have each of these commands run as part of such an ssh invocation. However, commands 2-4 depend on the current directory setting effected by command 1. And if you parcel these into 4 separate ssh invocations, any environmental effect each may have will not carry over to the others, I think. So how about running all 4 of them as a group, and invoking ssh to take care of the group as a whole? The precedent for that is in the above tar exercise ("Backing up data using tar, over an encrypted session using ssh"). Look at the use of parentheses in the tar command there, and how it grouped together 2 commands in a semicolon-separated list, to be executed as one. This shell technique is called-- surprise-- command grouping. Also, the code here made use of your local USER and HOME environmental variables. Better avoid that now. Instead, hard-code the names of the remote directories you'll be working with. When you have it working, show me. Till then, ask me and I'll try to assist. I want to see it working if you can get it working.