The first four lectures have been a crash course in the shell and shell programming. Now we move to the C language. We will spend the rest of the course developing our C and systems programming skill set by first understanding the basics of the langaueg and then through examples study good code and write our own.
In this lecture, will serve as an introduction to the C language. Why is C an important laguage 30 years after its development?
We plan to learn the following from today’s lecture:
The notes used in this lecture are based on Chris McDonald’s CS23 notes on an introduction to C. Thanks Chris.
OK. Let’s get started.
We intend to use the text book more as a reference than working through the book in a stepwise fashion - no time for that. We will relate to sections and use example code from the book from time to time. I would suggest that students start reading the book - it is very readable.. Please read chapters 2 and 3 by the next lecture. If you already have some knowledge of C you can skip this reading assignment. This type of reading is different from assigned reading of articles which we typically will discuss in class. Reading from the course book is more of to back up what we discuss or for you to fill in your knowledge of things we don’t have time to dive deeply into in class. Read as much as time permits.
C can be correctly described as a successful, general purpose programming language, a description also given to Java and C++. C is a procedural programming language, not an object-oriented language like Java or C++. Programs written in C can of course be described as “good” programs if they are written clearly, make use of high level programming practices, and are well documented with sufficient comments and meaningful variable names. Of course all of these properties are independent of C and are provided through many high level languages.
C has the high level programming features provided by most procedural programming languages - strongly typed variales, constants, standard (or base) datatypes, enumerated types, a mechanism for defining your own types, aggregate structures, control structures, recursion and program modularization.
C does not support sets of data, Java’s concept of a class or objects, nested functions, nor subrange types and their use as array subscripts, and has only recently added a a Boolean datatype.
C does have, however, separate compilation, conditional compilation, bitwise operators, pointer arithmetic and language independent input and output. The decision about whether C, C++, or Java is the best general purpose programming language (if that can or needs be decided), is not going to be an easy one.
C is the programming language of choice for most systems-level, engineering, and scientific programming. The world’s popular operating systems - Linux, Windows and Mac OS-X, their interfaces and file-systems, are written in C; the infrastructure of the Internet, including most of its networking protocols, web servers, and email systems, are written in C; software libraries providing graphical interfaces and tools, and efficient numerical, statistical, encryption, and compression algorithms, are written in C; and the software for most embedded devices, including those in cars, aircraft, robots, smart appliances, sensors, mobile phones, and game consoles, is written in C.
Nearly all operators in C are identical to those of Java. However, the role of C in system programming exposes us to much more use of the shift and bit-wise operators than in Java.
Assignment
=
Arithmetic
+, , *, /, %, unary
Priorities may be overridden with ( )s.
Relational
>, >=, <, <= (all have same precedence)
== (equality) and != (inequality)
Logical
&& (and), || (or), ! (not)
Pre- and post- decrement and increment
Any (integer, character or pointer) variable may be either incremented or decremented before or after its value is used in an expression.
For example :
–fred will decrement fred before value used.
++fred will increment fred before value used.
fred– will get (old) value and then decrement.
fred++ will get (old) value and then increment.
Bitwise operators and masking
& (bitwise and), | (bitwise or), ~ (bitwise negation).
To check if certain bits are on (fred & MASK), etc.
Shift operators << (shift left), >> (shift right).
Combined operators and assignment
a += 2; a -= 2;
a *= 2
May be combined as in a += b; a = a+b;
Type coercion
C permits assignments and parameter passing between variables of different types using type casts or coercion. Casts in C are not implicit, and are used where some languages require a “transfer function”.
Expressions are all evaluated from left-to-right, and the default precedence may be overridden with brackets.
() coercion (highest)
++ – !
* / %
+ -
<< >>
!= ==
&
|
&&
||
? :
=
, (lowest)
Variable names (and type and function names as we shall see later) must commence with an alphabetic or the underscore character A-Za-z˙ and be followed by zero or more alphabetic, underscore or digit characters A-Za-z˙0-9. Most C compilers, such as gcc, accept and support variable, type and function names to be up to 256 characters in length. Some older C compilers only supported variable names with up to 8 unique leading characters and keeping to this limit may be preferred to maintain portable code. It is also preferred that you do not use variable names consisting entirely of uppercase characters uppercase variable names are best reserved for #define-ed constants, as in MAXSIZE above. Importantly, C variable names are case sensitive and MYLIMIT, mylimit, Mylimit and MyLimit are four different variable names.
Variables are declared to be of a certain type, this type may be either a base type supported by the C language itself, or a user-defined type consisting of elements drawn from Cs set of base types. Cs base types and their representation on our labs Pentium PCs are:
bool an enumerated type, either true or false
char the character type, 8 bits long
short the short integer type, 16 bits long
int the standard integer type, 32 bits long
long the longer integer type, also 32 bits long
float the standard floating point (real) type, 32 bits long (about 10 decimal digits of precision)
double the extra precision floating point type, 64 bits long (about 17 decimal digits of precision)
enum the enumerated type, monotonically increasing from 0
Very shortly, we will see the emergence of Intels IA64 architecture where, like the Power-PC already, long integers occupy 64 bits.
We can determine the number of bytes required for datatypes with the sizeof operator. In contrast, Java defines how long each datatype may be. Cs only guarantee is that:
sizeof(char) <= sizeof(short) <= sizeof(int) <= sizeof(long)
Let’s write some C code to look at these base data types. We will use the sizeof operator and the printf function. We will also define vriables of each of the base types and print the initialzed values as part of the data-types.c code.
C code: data-types.c
The contents of data-types.c looks like this:
Once we have the C code we have to compile it with gcc with the various compiler switches we discussed in Lecture 1. Let’s compile the code using :
mygcc filename filename.c as the convention. The compiler produces an executable called filename. You do not have to use chmod to make it an executable. The compiler takes care of that when it creates an executable with the correct permission for file filename.
Check it out: Save the file in your directory cs23/code/ Compile and run the code. Check the out put.
Base types may be preceded with one of more storage modifier :
auto the variable is placed on the stack (default, deprecated)
extern the variable is defined outside of the current file
register request that the variable be placed in a register (ignored)
static the variable is placed in global storage with limited visibility
typedef introduce a user-defined type
unsigned storage and arithmetic is only of/on positive integers
All scalar auto and static variables may be initialized immediately after their definition, typically with constants or simple expressions that the compiler can evaluate at compile time. The C99 language defines that all uninitialized global variables, and all uninitialized static local variables will have the starting values resulting from their memory locations being filled with zeroes - conveniently the value of 0 for an integer, and 0.0 for a floating point number.
Scope is defined as the section (e.g., function, block) of the program where the variable is valid and known.
In Java, a variable is simply used as a name by which we refer to an object. A newly created object is given a name for later reference, and that name may be re-used to refer to another object later in the program. In C, a variable more strictly refers to a memory address (or contiguous memory address starting from the indicated point) and the type of the variable declares how that memorys contents should be interpreted and modified.
C only has two true lexical levels, global and function, though sub-blocks of variables and statements may be introduced in sub-blocks in many places, seemingly creating new lexical levels. As such, variables are typically defined globally (at lexical level 0), or at the start of a statement block, where a functions body is understood to be a statement block.
Variables defined globally in a file, are visible until the end of that file. They need not be declared at the top of a file, but typically are. If a global variable has a storage modifier of static, it means that the variable is only available from within that file. If the static modifier is missing, that variable may be accessed from another file if part of a program compiled and linked from multiple source files.
The extern modifier is used (within “our” file) to declare the existence of the indicated variable in another file. The variable may be declared as extern in all files, but must be defined (and not as a static!) in only a single file.
Variables may also be declared at the beginning of a statement block, but may not be declared anywhere other than the top of the block. Such variables are visible until the end of that block, typically until the end of the current function. A variables name may shadow that of a global variable, making that global variable inaccessible. Blocks do not have names, and so shadowed variables cannot be named. Local variables are accessible until the end of the block in which they are defined.
Local variables are implicitly preceded by the auto modifier as control flow enters the block, memory for the variable is allocated on the run-time stack. The memory is automatically deallocated (or simply becomes inaccessible) as control flow leaves the block. The implicit auto modifier facilitates recursion in C each entry to a new block allocates memory for new local variables, and these unique instances are accessible only while in that block.
If a local variable is preceded by the static modifier, its memory is not allocated on the run-time stack, but in the same memory as for global variables. When control flow leaves the block, the memory is not deallocated, and remains for the exclusive use by that local variable. The result is that a static local variable retains its value between entries to its block. Whereas the starting value of an auto local variable (sitting on the stack) cannot be assumed (or more correctly, should be considered to contain a totally random value), the starting value of a static local variable is as it was when the variable was last used.
Control flow within C programs is almost identical to the equivalent constructs in Java. However, C provides no exception mechanism, and so C has no try, catch, and finallyconstructs.
Conditional execution
Of significance, and a very common cause of errors in C programs, is that pre C99 has no Boolean datatype. Instead, any expression that evaluates to the integer value of 0 is considered false, and any nonzero value as true. A conditional statements controlling expression is evaluated and if non-zero (i.e. true) the following statement is executed. Most errors are introduced when programmers (accidently) use embedded assignment statements in conditional expressions:
A good habit to get into is to place constants on the left of (potential) assignments:
When compiling with gcc -std=c99 -Wall -pedantic ... the only way to “shut the compiler up” is to use extra parenthesis:
Cs other control flow statements are very unsurprising:
Any of the 4 components may be missing, If the conditional-expression is missing, it is always true, Infinite loops may be requested in C with for( ; ; ) ... or with while(1) ...
One of the few differences here between C and Java is that C permits control to drop down to following case constructs, unless there is an explicit breakstatement.