Don’t forget this week’s reading.
In this lecture, we discuss the Unix shell and its commands. The ‘shell’ is a command-line interpreter and invokes kernel-level commands. It also can be used as a programming language to design your own commands. We’ll come to shell programming in a future lecture.
We do not recommend that you buy a book about Unix or the shell; there are some very good references and free-access online books – see the resources page – and we have selected some interesting and useful readings.
If you need help on the meaning or syntax of any Unix shell command you can use the manual (man) pages on a Unix system (see below) or the web unix commands. Just keep in mind that some commands’ syntax varies a bit across Unix flavors, so when in doubt, check the man page on the system you’re using.
In their book Program Design in the Unix Environment (1984), Rob Pike and Brian Kernighan put it this way:
``Much of the power of the Unix operating system comes from a style of program design that makes programs easy to use and, more important, easy to combine with other programs. This style has been called the use of software tools, and depends more on how the programs fit into the programming environment and how they can be used with other programs than on how they are designed internally. […] This style was based on the use of tools: using programs separately or in combination to get a job done, rather than doing it by hand, by monolithic self-sufficient subsystems, or by special-purpose, one-time programs.’’
Historical note - Unix
``This is the Unix philosophy: Write programs that do one thing and do it well. Write programs to work together. Write programs to handle text streams, because that is a universal interface.’’ — Doug McIlroy
We plan to cover the following topics in today’s lecture:
- The shell
- The file system
For today’s class:
- Sit with your group, and get to know your group mates.
- Get to know your Learning Fellow.
- Choose a group name.
Commands, switches, arguments
The shell is the Unix command-line interpreter.
It provides an interface between the user and the kernel and executes programs called ‘commands’.
For example, if a user enters
ls then the shell executes the
ls command, which actually executes a program stored in the file
The shell can also execute other programs including scripts (text files interpreted by a program like python or bash) and compiled programs (e.g., written in C).
Even your own programs – once marked ‘executable’ – become commands you can run from the shell!
You will get by in the course by becoming familiar with a subset of the Unix commands; don’t let yourself be overwhelmed by the presence of hundreds of commands. You will probably be regularly using 2-3 dozen of them by the end of the term.
Unix has often been criticized for being very terse (it’s rumored that its designers were bad typists). Many commands have short, cryptic names and vowels are a rarity:
awk, cat, cp, cd, chmod, echo, find, grep, ls, mv, rm, tr, sed, comm
We will learn to use all of these commands and more.
Unix command output is also very terse - the default action on success is silence. Only errors are reported, and error messages are often terse. Unix commands are often termed ‘tools’ or ‘utilities’, because they are meant to be simple tools that you can combine in novel ways.
Instructions entered in response to the shell prompt are interpreted first by the shell - expanding any variable references, filename wildcards, or special syntax. Thus, the shell can rewrite the command line; then, it expects the command line to have the following syntax:
[ ] indicate that the arguments are optional, and the notation above means that there are zero or more arguments.
Arguments are separated by white space.
Many commands can be executed with or without arguments.
Others require arguments, or a certain number of arguments, (e.g.,
cp sort.c anothersort.c) to work correctly.
If none are supplied, they will provide some error message in return.
Another part of the Unix philosophy is to avoid an explosion in the number of commands by having most commands support various options (sometimes called flags or switches), which modify the actions of the commands.
For example, let’s use the
ls command and the
-l option switch to list in long format the file
f00xxxx@plank:~$ ls cs50-dev/ test.txt f00xxxx@plank:~$ cd cs50-dev/ f00xxxx@plank:~/cs50-dev$ ls class_activities/ dotfiles/ examples/ play/ README.md setup/ f00xxxx@plank:~/cs50-dev$ ls -l README.md -rw-r--r-- 1 f00xxxx thayerusers 6941 Jan 9 2021 README.md f00xxxx@plank:~/cs50-dev$
Switches are often single characters preceded by a hyphen (e.g.,
Most commands accept switches in any order, though they generally must appear before all the real arguments (usually filenames).
In the case of the
ls example below, the command arguments represent file or directory names.
The options modify the operation of the command and are usually operated on by the program invoked by the shell rather than the shell itself.
Unix programs always receive a list of arguments, containing at least one argument, which is always the command name itself.
ls that first argument would be “
The first argument is referred to as argument 0, “the zero-th argument”.
ls example, argument 1 is
-l and argument 2 is
Some commands also accept their switches grouped together.
For example, the following switches to
ls are identical:
ls -tla students ... ls -t -l -a students
The shell parses the words or tokens (command name and arguments) you type on the command line, and asks the kernel to execute the program corresponding to that command; the interpretation of the arguments (as switches, filenames, or something else) is determined by that program.
Typically, the shell processes the complete line after a carriage return is entered and then goes off to find the program that the command line specified.
If the command is a pathname, whether relative (e.g.,
./mycommand) or absolute (e.g.,
~cs50/mycommand), the shell simply executes the program in that file.
If the command is not a pathname, the shell searches through a list of directories in your “path”, which is defined by the shell variable called
Take a look at your
PATH by asking the shell to substitute its value (
$PATH) and pass it as an argument to the
f00xxxx@plank:~$ echo $PATH /dartfs-hpc/admin/opt/el7/intel/compilers_and_libraries_2019.3.199/linux/bin/intel64:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/.../thayerfs/apps/visit/bin:/thayerfs/apps/xilinx/bin f00xxxx@plank:~$
So where does the
ls command executed above reside in the Unix directory hierarchy?
Let’s use another command to find out.
f00xxxx@plank:~$ which ls ls is aliased to `ls -F --color=auto' ls is /bin/ls f00xxxx@plank:~$
The first line of response says that
ls is “aliased”.
This is a shell feature; the shell allows us to define “aliases”, which act just like commands but are actually just a textual substitution of a command name (the alias) to some other string (in this case,
Thus, any time I type
ls blah blah, it treats it as if I had typed
ls -F blah blah.
-F option tells
ls to add a trailing symbol to some names in its output; it adds a
/ to the names of directories, a
@ to the names of symbolic links (um, that’s another conversation), and some other even specialized cases.
Of course, the shell then still needs to resolve
It then searches the
PATH to find an executable file with that name; in this case, it appears that
ls exists in both
/usr/bin and in
The shell will execute the first one, because it is found first in the PATH.
Below you can see the effect of running
ls (the alias) and
/bin/ls (the raw command, without the
f00xxxx@plank:~$ ls cs50-dev/ test.txt f00xxxx@plank:~$ /bin/ls cs50-dev test.txt f00xxxx@plank:~$
You can see the contents of any file with the
cat command, so named because it concatenates all the files listed as arguments, printing one after the other.
For very long files, though, the output will quickly scroll off your terminal.
Less Is More: The
more commands are handy for quickly looking at files.
The syntax is
less filename and
Take a look at the
man pages to get the details of each.
tail display a number of lines (selectable via switches, of course) at the beginning and end of a file, respectively.
f00xxxx@plank:~/cs50-dev/demo$ ls students f00xxxx@plank:~/cs50-dev/demo$ head -3 students alice bob charles f00xxxx@plank:~/cs50-dev/demo$ tail -3 students charles deana eliza f00xxxx@plank:~/cs50-dev/demo$
See what these do:
more [textfile], and
less [textfile], where
[textfile] can be any text file with a lot of information.
Long before there were windows and graphical displays, or even screens, there were text editors.
Two are in common use on Unix system today:
Actually, there is an expanded/improved version of
vim, which is quite popular.
You should try both and become comfortable with at least one. Yes, it’s tempting to use an external graphical editor (like Sublime), but there are times when you must use a text-only editor and thus you should get used to it.
See https://www.cs.dartmouth.edu/~cs50/Resources/#editors for some resources that can help you learn
Unix file system
The Unix file system is a hierarchical file system. The file system consists of a very small number of different file types. The two most common types are files and directories.
A directory (akin to a folder on a MacOS or Windows computer) contains the names and locations of all files and directories below it.
A directory always contains two special files
. (dot) and
.. (dot dot);
. represents the directory itself, and
.. represents the directory’s parent.
In the following, I make a new directory, change my current working directory to be that new directory, create a new file in that directory, and use
ls to explore the contents of the new directory and its parent.
f00xxxx@plank:~/cs50-dev/demo$ mkdir test f00xxxx@plank:~/cs50-dev/demo$ ls students test/ f00xxxx@plank:~/cs50-dev/demo$ cd test f00xxxx@plank:~/cs50-dev/demo/test$ echo hello > somefile f00xxxx@plank:~/cs50-dev/demo/test$ ls -a ./ ../ somefile f00xxxx@plank:~/cs50-dev/demo/test$ ls somefile f00xxxx@plank:~/cs50-dev/demo/test$ ls . somefile f00xxxx@plank:~/cs50-dev/demo/test$ ls .. students test/ f00xxxx@plank:~/cs50-dev/demo/test$
Directory names are separated by a forward slash
/, forming pathnames.
A pathname is a filename that includes some or all of the directories leading to the file; an absolute pathname is relative to the root (
/) directory and begins with a
/, in the first example below, whereas a relative pathname is relative to the current working directory, as in the second example below.
Notice that a relative pathname can also use
.., as in the third example below.
f00xxxx@plank:~/cs50-dev/demo$ pwd /thayerfs/home/f00xxxx/cs50-dev/demo f00xxxx@plank:~/cs50-dev/demo$ ls /thayerfs/home/f00xxxx/cs50-dev/demo/test somefile f00xxxx@plank:~/cs50-dev/demo$ ls ./test somefile f00xxxx@plank:~/cs50-dev/demo$ ls ../demo/test somefile f00xxxx@plank:~/cs50-dev/demo$
As implied by the shell prompt, the current working directory is
~/cs50-dev/demo, which is shorthand for the home directory of user, followed by the directory
cs50-dev, followed by the directory
demo. This is also the directory at path
Moving around the file system
The “change directory” command (
cd) allows us to move around the Unix directory hierarchy, that is, to change our “current working directory” from which all relative filenames and pathnames will be resolved.
cd to move around the local directories that are rooted at
~ refers to the home directory and
.. refers to the parent directory.
f00xxxx@plank:~/cs50-dev/demo$ pwd /thayerfs/home/f00xxxx/cs50-dev/demo f00xxxx@plank:~/cs50-dev/demo$ ls students test/ f00xxxx@plank:~/cs50-dev/demo$ cd test f00xxxx@plank:~/cs50-dev/demo/test$ ls somefile f00xxxx@plank:~/cs50-dev/demo/test$ cd .. f00xxxx@plank:~/cs50-dev/demo$ ls students test/ f00xxxx@plank:~/cs50-dev/demo$ cd ../.. f00xxxx@plank:~$ ls cs50-dev/ test.txt f00xxxx@plank:~$
The shell prompt is helpfully tracking the current working directory as we move.
Listing and globbing files
Here are a popular set of switches you can use with
-l list in long format (as we have been doing) -a list all entries (including `dot` files, which are normally hidden) -t sort by modification time (latest first) -r list in reverse order (alphabetical or time) -R list the directory and its subdirectories recursively
The shell also interprets certain special characters like
* matches zero or more characters,
? matches one character, and
 matches one character from the set (or range) of characters listed within the brackets:
f00xxxx@plank:~/cs50-dev/examples$ ls args.sh* freadline.h madlib2b.txt names6.c pointer3.c arguments.c guess1a.sh* madlib2.txt names7.c readline.c atoi.c guess1b.sh* madlib.c names8.c readline.h bagA.c guess1.c madlibprint.c names9.c readlinep.c bagA.h guess2.c make/ namesA.c readlinep.h bag-unit/ guess2.sh* memory.c namesA-makefile README.md bag-unit-full/ guess3.c memory.h namesB.c shifter.sh* bugsort.c guess3.sh* names0.c namesB-makefile sizeof.c comics.sh* guess4.c names1.c names.txt trees/ crash.c guess4.sh names2.c overflow.c files.c guess5.c names3.c pointer0.c files-input.txt guess6.c names4.c pointer1.c freadline.c madlib1.txt names5.c pointer2.c f00xxxx@plank:~/cs50-dev/examples$ ls pointer*.c pointer0.c pointer1.c pointer2.c pointer3.c f00xxxx@plank:~/cs50-dev/examples$ ls p*.c pointer0.c pointer1.c pointer2.c pointer3.c f00xxxx@plank:~/cs50-dev/examples$ ls *-*.txt files-input.txt f00xxxx@plank:~/cs50-dev/examples$ ls madlib*.* madlib1.txt madlib2b.txt madlib2.txt madlib.c madlibprint.c f00xxxx@plank:~/cs50-dev/examples$ ls madlib?.* madlib1.txt madlib2.txt f00xxxx@plank:~/cs50-dev/examples$ ls madlib[0-2].txt madlib1.txt madlib2.txt f00xxxx@plank:~/cs50-dev/examples$
ls program normally does not list any files whose filename begins with
. There is nothing special about these files, except
.., as far as Unix is concerned.
It’s simply a convention - files whose names begin with
. are to be considered ‘hidden’, and thus not listed by
ls or matched with by the shell’s
* globbing character.
Home directories, in particular, include many ‘hidden’ (but important!) files.
-a switch tells
ls to list “all” files, including those that begin with a dot (aka, the hidden files).
f00xxxx@plank:~$ ls cs50-dev/ test.txt f00xxxx@plank:~$ ls -a ./ .bashrc .gitconfig .viminfo ../ .bashrc.default .gitignore_global .vimrc .bash_history .cache/ .gnupg/ .vscode-server/ .bash_logout cs50-dev/ .notfsquota .wget-hsts .bash_profile .emacs test.txt f00xxxx@plank:~$
to see just the dot files, let’s get clever with the shell’s glob characters:
f00xxxx@plank:~$ ls -ad .* ./ .bashrc .gitignore_global .vscode-server/ ../ .bashrc.default .gnupg/ .wget-hsts .bash_history .cache/ .notfsquota .bash_logout .emacs .viminfo .bash_profile .gitconfig .vimrc
All of these “dot files” (or “dot directories”) are important to one program or another, for example:
.bash_history- used by bash to record a history of the commands you’ve typed
.bash_logout- executed by bash when you log out
.bash_profile- executed by bash when you log in
.bashrc- executed by bash whenever you start a new shell
.emacs- used by emacs text editor
.viminfo- used by vim text editor
.vimrc- used by vim text editor
Bash shell startup files
bash shell looks for several files in your home directory:
.bash_profile- executed by bash when you log in
.bashrc- executed by bash whenever you start a new shell
.bash_logout- executed by bash when you log out
.bash_history- used by bash to record a history of the commands you’ve typed
.bashrc file is especially important, because
bash reads it every time you start a new
bash shell, that is, when you log in, when you start a new interactive shell, or when you run a new bash script.
.bash_profile is only read when you login.) In each case,
bash reads the files and executes the commands therein.
Thus, you can configure your
bash experience by having it declare some variables, define some aliases, and set up some personal favorites.
For CS50 we strongly recommend the following customizations:
# aliases used for cs50 alias mygcc='gcc -Wall -pedantic -std=c11 -ggdb' alias myvalgrind='valgrind --leak-check=full --show-leak-kinds=all' # safety aliases alias rm='rm -i' alias cp='cp -i' alias mv='mv -i' # convenience aliases alias ls='ls -F --color=auto' alias mkdir='mkdir -p' alias which='type -all'
mygcc alias adds some extra options to
gcc, which we’ll cover next week.
myvalgrind alias adds some extra options to
valgrind, which we’ll cover later in the term.
The “safety” aliases protect you from accidentally deleting or overwriting files.
The “convenience” aliases give you some handy shortcuts and make the output of some commands more useful.
To make all these changes in your
.bashrc, log into your Unix account and copy the above content to your
.bashrc file in your home directory. You will need
ls -a command in your home directory to see
.bashrc file. If you do not have it, create one.
You can then edit
~/.bash_profile to your own taste.
Many times you want to find a file but do not know where it is in the directory tree (Unix directory structure is a tree - rooted at the
/ directory) .
find command can walk a file hierarchy:
f00xxxx@plank:~$ find . -name pointer0.c -print ./cs50-dev/examples/pointer0.c f00xxxx@plank:~$ find ./cs50-dev/demo -type d -print ./cs50-dev/demo ./cs50-dev/demo/test f00xxxx@plank:~$ find ./cs50-dev/examples -name \*.txt -print ./cs50-dev/examples/files-input.txt ./cs50-dev/examples/names.txt ./cs50-dev/examples/madlib2b.txt ./cs50-dev/examples/madlib2.txt ./cs50-dev/examples/madlib1.txt f003p2r@plank:~$ ls cs50-dev/ test.txt f00xxxx@plank:~$ cd cs50-dev/ f00xxxx@plank:~/cs50-dev$ ls class_activities/ demo/ dotfiles/ examples/ play/ README.md setup/ f00xxxx@plank:~/cs50-dev$ ls demo/ students test/ f00xxxx@plank:~/cs50-dev$ cd ~ f00xxxx@plank:~$ find ./cs50-dev/demo -name t\* -mtime -1 -print ./cs50-dev/demo/test
The first example searches the current directory for a file with name as provided (
-name is for case sensitive while
-iname is for case insensitve search). The second example searches
./cs50-dev/demo for any directories (
-type d) and prints their pathnames. The third example uses a wildcard
* to print pathnames of files whose name matches a pattern; the backslash
\ is there to prevent the shell from interpreting the
*, allowing it to be part of the argument to
find, which interprets that character itself. The fourth example with
find combines two factors, to print pathnames of files whose name matches
t* and whose modication time
mtime is less than one day
-1 in the past.
Our top commands
We’ve explored several shell commands, below is a list of many common commands. Those with *asterisk will be introduced later. You’ll often need only about half of them.
alias cat, cd, chmod, comm, cp, cut date echo, emacs, expr, exit file, find gcc, gdb*, git*, grep head less, logout, lpr, ls make*, man, mkdir, more, mv open (MacOS) pbpaste, pbcopy (MacOS) pwd rm, rmdir scp, sed, sort, ssh tail, tar, touch, tr uniq whereis, which vi, vim
Recall we mentioned “teletypes” in the first lecture. It was at Dartmouth that a teletype was actually used to interact with a remote computer - the first ever long-distance terminal, decades before the Internet. Scavenger hunt! Who can find the plaque below, on the Dartmouth campus? Share with the class via a post when you find it!
Other useful information
We don’t have time in lecture to cover everything you may want to know; read on.
Navigating within man pages
You may have found the
man system to be a little challenging to navigate.
There is a message that is displayed at the very bottom of the screen when you first enter the command that you might have missed (most people do):
Manual page xxx(n) line 1 (press h for help or q to quit)
If you enter ‘
h’ to see the help you will find many more commands than you’re likely to never use when reading
This is because the man-page reader is actually the
less command of Unix.
I tend to use only a few:
for space (the spacebar) advances to the next screenful,
bgoes back to the previous screenful,
eor down-arrow advances one more line,
yor up-arrow goes back one line,
/allows one to type a search phrase and hit return,
man, and returns to the shell prompt.
There are shells, shells, and more shells
There are a number of shells available to a Unix user – so which one do you select? The most common shells are:
sh: the original shell, known as the Bourne Shell,
tcsh: well-known and widely used derivatives of the Bourne shell,
ksh: the Korn shell, and
bash: the Bourne Again SHell, developed by GNU, is the most popular shell used for Linux.
bash is the default shell for Unix accounts in our department.
The basic shell operation is as follows.
The shell parses the command line; the first word on the line is the command name.
If the command is an alias, it substitutes the alias text and again identifies the command.
If the command is one built-in to the shell (there are a few, like
which) it performs that command’s action.
Otherwise, the shell looks for the executable file that matches that program name by searching directories listed in the
The shell then starts that program as a new process and passes any options and arguments to the program.
A process is a running program.
You can see a list of your processes with the command
Unix itself imposes almost no constraints or interpretation on the contents of files - the only common case is that of a compiled, executable program: it has to be in a very specific binary format for the operating system (Unix) to execute it. All other files are used by some program or another, and it’s up to those programs to interpret the contents as they see fit. The great power of Unix, and the common shell commands, is that any file can be read by any program; the most common format are plain-text (ASCII) files that are formatted as a series of “lines” delimited by “newline” characters (\n, known by its ASCII code 012).
If you are unsure about the contents of a file (text, binary, compressed, Unix executabe, some format specific a certain application, etc.).
file command is useful; it makes an attempt to judge the format of the file.
f00xxxx@plank:~$ cd cs50-dev/demo f00xxxx@plank:~/cs50-dev/demo$ ls students test/ f00xxxx@plank:~/cs50-dev/demo$ file students students: ASCII text f00xxxx@plank:~$ cd ~ f00xxxx@plank:~$ file /bin/ls /bin/ls: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 3.2.0, BuildID[sha1]=9567f9a28e66f4d7ec4baf31cfbf68d0410f0ae6, stripped f00xxxx@plank:~$