C syntax

From Free net encyclopedia

The syntax of the C programming language is a set of rules that defines how a C program will be written and interpreted.

1 Data structures
2 Operators
3 Control structures
4 Functions
5 Input/Output
- 5.1 Standard I/O
- 5.2 File I/O
6 Miscellaneous
7 See also
8 References
9 External links

[edit]

Data structures

[edit]

Primitive data types

Many programming languages, including C, represent numbers in two forms: integral and real (or non-integral). This distinction is made due to the technical aspect of the methods used to store values in memory.

The integral specifier is int; it is used to denote the representation of integers. The integral type comes in different sizes, denoting the memory usage and highest magnitude^[1]. Modifiers are used to modify the default size: short, long and long long^[2]. The character type, whose specifier is char, represents the smallest addressable unit, which is normally an 8-bit byte.

The real (non-integral) form is used to denote the representation of numbers with a decimal or rational component. They do not however represent rational numbers exactly; they are approximated instead. There are three types of real values, denoted by their specifier: single-precision (specifier float), double-precision (double) and double-extended-precision (long double). Each of these represent non-integral values in a different form.

Integral types can be either signed (which is implied when not specified) or unsigned. However char, signed char and unsigned char are all different types, and plain char may be signed or unsigned.

When signed, one bit of the representation of the integer in memory may instead be used to represent sign (positive or negative). For example, a signed 16-bit integer may use one bit for its sign and the remaining 15 bits for the representation of its value. It follows that when unsigned, that bit is not limited in this fashion.

^[1] = In terms of integral values, magnitude represents the maximum value (independent of sign) that can be represented. Signing skews this range, as is illustrated in the table below.

^[2] = The long long modifier was introduced in the C99 standard.

[edit]

Constants that define boundaries of primitive data types

The standard header file limits.h defines the minimum and maximum values of the integral primitive data types, amongst other limits. The standard header file float.h defines the minimum and maximum values of the float, double, and long double. It also defines other limits that are relevant to the processing of floating-point, single-precision, and double-precision values as they are defined in the IEEE 754 standard.

Constants that define boundaries of common integral types
Implicit Specifier	Explicit Specifier	Minimum Value	Maximum Value
`char`	same	`CHAR_MIN`	`CHAR_MAX`
`signed char`	same	`SCHAR_MIN`	`SCHAR_MAX`
`unsigned char`	same	0	`UCHAR_MAX`
`short`	`signed short int`	`SHRT_MIN`	`SHRT_MAX`
`unsigned short`	`unsigned short int`	0	`USHRT_MAX`
none, `signed`, or `int`	`signed int`	`INT_MIN`	`INT_MAX`
`unsigned`	`unsigned int`	0	`UINT_MAX`
`long`	`signed long int`	`LONG_MIN`	`LONG_MAX`
`unsigned long`	`unsigned long int`	0	`ULONG_MAX`
`long long`^[1]	`signed long long int`^[1]	`LLONG_MIN`^[2]	`LLONG_MAX`^[2]
`unsigned long long`^[1]	`unsigned long long int`^[1]	0	`ULLONG_MAX`^[2]

^[1]—The long long modifier is only supported by C99-compliant compilers.

^[2]—The LLONG_MIN, LLONG_MAX, and ULLONG_MAX constants are only defined in limits.h if it was intended to be used with a C99-compliant compiler.

[edit]

Typical boundaries of primitive integral types

The following is a list of the common integral types and their typical sizes and boundaries. These may vary from one implementation to another. ISO C provides the inttypes.h header, which defines signed and unsigned integral types of guaranteed sizes between 8 and 64 bits.

Typical sizes and boundaries of common integral types
Implicit Specifier	Explicit Specifier	Bits	Bytes	Minimum Value	Maximum Value
`char`	same	8	1	-128 or 0	127 or 255
`signed char`	same	8	1	-128	127
`unsigned char`	same	8	1	0	255
`short`	`signed short int`	16	2	-32 768	32 767
`unsigned short`	`unsigned short int`	16	2	0	65 535
`long`	`signed long int`	32	4	-2 147 483 648	2 147 483 647
`unsigned long`	`unsigned long int`	32	4	0	4 294 967 295
`long long`^[1]	`signed long long int`^[1]	64	8	-9 223 372 036 854 775 808	9 223 372 036 854 775 807
`unsigned long long`^[1]	`unsigned long long int`^[1]	64	8	0	18 446 744 073 709 551 615

^[1]—The long long modifier is only supported by C99-compliant compilers.

The size and limits of the nil int primitive type (without the short, long, or long long modifiers) vary much more than the other integral types between implementations. The Single UNIX Specification indicates that the int type must be at least 32 bits, however the ISO C standards only require 16 bits.

[edit]

References

The asterisk modifier (*) specifies a reference type, which is more commonly known as a pointer. Where the specifier int would refer to the primitive integral type, the specifier int * refers to the reference integral type, a pointer type. Reference values associate two pieces of information: a memory address and a data type. The following line of code declares a reference integral variable (a pointer) called ptr:

int *ptr;

[edit]

Referencing

When a local pointer is declared, it has an arbitrary value associated with it. The address associated with a pointer must be changed using assignment prior to using it. In the following example, ptr will be set so that it points to the same data as the primitive integral variable a:

int *ptr;
int a;

ptr = &a;

In order to accomplish this, the reference operator (unary &) was used. It returns the memory location of the data object that follows. The operator as a result is often called the "address-of" operator.

[edit]

Dereferencing

By the same token, the value can be retrieved from a reference value. In the following example, the primitive integral variable b will be set to the data that is referenced by ptr:

int *ptr;
int a, b;

a = 10;
ptr = &a;
b = *ptr;

In order to accomplish that task, the dereference operator (unary *) was used. It returns the data to which its operand—which must be of pointer type—points. Thus, the expression *ptr has the same value as a.

The overloading of the asterisk character with two related behaviors can be confusing at first. It is important to understand the differences between its use as a modifier in a declaration and as a unary operator in an expression.

[edit]

Equivalent reference and primitive statements

The following is a table that lists the equivalent statements with both primitive and reference types, using both the reference and dereference operators. In it, the primitive variable d and the reference variable ptr are implied:

Equivalent reference and primitive statements
	To a primitive value	To a reference value
From a primitive value	`d`	`&d`
From a reference value	`*ptr`	`ptr`

[edit]

Arrays

[edit]

Static array declaration

Arrays are used in C to represent structures of consecutive values of the same type. The declaration of a static (fixed-size) array is contrary to other forms; consider the following syntax:

int array[n];

Which will define an array named array to hold n values of the primitive type int. In practice, memory for n integral values has been reserved and assigned for this array. The variable array decays to the reference integral type in most expressions; the value obtained from that decay points to the memory address of the first value.

[edit]

Accessing elements

The primary facility for accessing the component values of an array—which are often called elements—is the array subscript operators. To access the nth element of array, the calling syntax would be array[n], which would return the primitive value there associated. This appears very similar to—but is in function entirely different from—the declaration syntax.

Array subscripts begin numbering at 0. The largest logical array subscript is therefore equal to the number of elements in the array minus 1. To illustrate this, consider an array of 10 elements; the first element would be [0] and the last element would be [9].

It is also possible to use pointer arithmetic to specify the reference value for each of the array elements. The following table illustrates both methods for the existing array:

Array subscripts v. Pointer arithmetic
Element	0	1	2	n
Primitive	`array[0]`	`array[1]`	`array[2]`	`array[n]`
Pointer	`*array`	`*(array + 1)`	`*(array + 2)`	`*(array + n)`

[edit]

Dynamic arrays

C provides no facility for bounds checking with arrays. Though logically the last subscript in an array of 10 elements would be 9, subscripts 10, 11, and so forth could be specified. Because arrays are homogeneous—that is they consist of only one type of data—only two pieces of information need be known: the address of the first element and the type of data.

Recall the declaration of a static array, which allocates the appropriate amount of memory to hold an array and associates a name with it:

int array[n];

This behavior can be imitated with the help of the C standard library. The malloc function provides a simple method for allocating memory. It takes one parameter: the amount of memory to allocate in bytes. Upon successful allocation, malloc returns a pointer to the first byte. If the allocation could not be completed, malloc returns a null pointer. The following segment is therefore similar in function to the static declaration:

int *ptr;
ptr = malloc(n * sizeof(int));

The result is a 'pointer to int' variable (ptr) that points to the first of n 'ints'. The advantage in using this dynamic allocation is that the amount of memory that is allocated to it can be changed after it has been declared.

When the dynamically-allocated memory is no longer needed, it should be released back to the operating system. This is done with a call to the free function. It takes a single parameter: a pointer to previously allocated memory. Typically, this is the value that was returned by the call to malloc. It is considered good practice to then set the pointer to NULL so that further attempts to access the memory to which it points will fail.

free(ptr);
ptr = NULL;

[edit]

Multidimensional arrays

In addition, C supports arrays of multiple dimensions. The method for defining them would make it appear that they are arrays of arrays, however in practice this may not be the case. Consider the following syntax:

int array2d[rows][columns];

Which will define an array of two dimensions; its first dimension is of size rows. Its second is of size columns for a total of rows * columns elements—a set of columns elements for each first-dimension element.

Regardless of the actual implementation, these multidimensional arrays can be treated as if they were arrays of pointers. For example, array2d[1] (if rows was ≥ 2) will be a reference integral value that points to an array of columns elements.

[edit]

Strings

[edit]

Library functions

strings may be manipulated without using the standard library. However, the library contains many useful functions for working with both zero-terminated strings and unterminated arrays of char.

The most commonly used string functions are:

strcat(dest, source) - appends the string source to the end of string dest
strchr(s, c) - finds the first instance of character c in string s and returns a pointer to it or a null pointer if c is not found
strcmp(a, b) - compares strings a and b (lexicographical ordering); returns negative if a is less than b, 0 if equal, positive if greater.
strcpy(dest, source) - copies the string source to the string dest
strlen(st) - return the length of string st
strncat(dest, source, n) - appends a maximum of n characters from the string source to the end of string dest; characters after the null terminator are not copied.
strncmp(a, b, n) - compares a maximum of n characters from strings a and b (lexical ordering); returns negative if a is less than b, 0 if equal, positive if greater.
strncpy(dest, source, n) - copies a maximum of n characters from the string source to the string dest
strrchr(s, c) - finds the last instance of character c in string s and returns a pointer to it or a null pointer if c is not found

The less common string functions are:

strcoll(s1, s2) - compare two strings according to a locale-specific collating sequence
strcspn(s1, s2) - returns the index of the first character in s1 that matches any character in s2
strerror(err) - returns a string with an error message corresponding to the code in err
strpbrk(s1, s2) - returns a pointer to the first character in s1 that matches any character in s2 or a null pointer if not found
strspn(s1, s2) - returns the index of the first character in s1 that matches no character in s2
strstr(st, subst) - returns a pointer to the first occurrence of the string subst in st or a null pointer if no such substring exists.
strtok(s1, s2) - returns a pointer to a token within s1 delimited by the characters in s2.
strxfrm(s1, s2, n) - transforms s2 into s1 using locale-specific rules

[edit]

Operators

Main article: [[Operators in C and C++]]

[edit]

Control structures

Basically, C is a free-form language.

Note: bracing style varies from programmer to programmer and can be the subject of great debate ("flame wars"). See Indent style for more details.

[edit]

Compound statements

Compound statements in C have the form

  { <optional-declaration-list> <optional-statement-list> }

and are used as the body of a function or anywhere that a single statement is expected.

[edit]

Expression statements

A statement of the form

  <optional-expression> ;

is an expression statement. If the expression is missing, the statement is called a null statement.

[edit]

Selection statements

C has three types of selection statements: two kinds of if and the switch statement.

The two kinds of if statement are

  if (<expression>)
     <statement>

and

  if (<expression>)
     <statement>
  else
     <statement>

In the if statement, if the expression in parentheses is nonzero or true, control passes to the statement following the if. If the else clause is present, control will pass to the statement following the else clause if the expression in parentheses is zero or false. The two are disambiguated by matching an else to the next previous unmatched if at the same nesting level. Braces may be used to override this or for clarity.

The switch statement causes control to be transferred to one of several statements depending on the value of an expression, which must have integral type. The substatement controlled by a switch is typically compound. Any statement within the substatement may be labeled with one or more case labels, which consist of the keyword case followed by a constant expression and then a colon (:). No two of the case constants associated with the same switch may have the same value. There may be at most one default label associated with a switch; control passes to the default label if none of the case labels are equal to the expression in the parentheses following switch. Switches may be nested; a case or default label is associated with the smallest switch that contains it. Switch statements can "fall through", that is, when one case section has completed its execution, statements will continue to be executed downward until a break statement is encountered. This is useful in many circumstances, but newer programming languages forbid case statements to "fall through". In the below example, if <label2> is reached, the statements <statements 2> are executed and nothing more inside the braces. However if <label1> is reached, both <statements 1> and <statements 2> are executed since there is no break to separate the two case statements.

  switch (<expression>) {
     case <label1> :
        <statements 1>
     case <label2> :
        <statements 2>
        break;
     default :
        <statements>
  }

[edit]

Iteration statements

C has three forms of iteration statement:

  do
     <statement>
  while ( <expression> ) ;

  while ( <expression> )
     <statement>

C89's for loop

  for ( <expression> ; <expression> ; <expression> )
     <statement>

was generalized in C99 to

  for ( <clause> ; <expression> ; <expression> )
     <statement>

with <clause> being the initialization part of the loop (an expression or a declaration).

In the while and do statements, the substatement is executed repeatedly so long as the value of the expression remains nonzero or true. With while, the test, including all side effects from the expression, occurs before each execution of the statement; with do, the test follows each iteration.

If all three expressions are present in a for, the statement

  for (e1; e2; e3)
     s;

is equivalent to

  e1;
  while (e2) {
     s;
     e3;
  }

Any of the three expressions in the for loop may be omitted. A missing second expression makes the while test nonzero, creating an infinite loop.

[edit]

Jump statements

Jump statements transfer control unconditionally. There are four types of jump statements in C: goto, continue, break, and return.

The goto statement looks like this:

  goto <identifier>;

The identifier must be a label located in the current function. Control transfers to the labeled statement.

A continue statement may appear only within an iteration statement and causes control to pass to the loop-continuation portion of the smallest enclosing such statement. That is, within each of the statements

  while (expression) {
     /* ... */
     cont: ;
  }

  do {
     /* ... */
     cont: ;
  } while (expression);

  for (optional-expr; optexp2; optexp3) {
     /* ... */
     cont: ;
  }

a continue not contained within a nested iteration statement is the same as goto cont.

The break statement is used to get out of a for loop, while loop, do loop, or switch statement. Control passes to the statement following the terminated statement.

A function returns to its caller by the return statement. When return is followed by an expression, the value is returned to the caller of the function. Flowing off the end of the function is equivalent to a return with no expression. In either case, the returned value is undefined.

[edit]

Functions

[edit]

Syntax

A C function definition consists of a return type (void if no value is returned), a unique name, a list of parameters in parentheses, (void if there are none), some statements and/or a return statement (again, if return type is void the return is not needed):

 <return-type> functionName( <parameter-list> )
 {
    <statements>
    
    return <variable of return-type>;
 }

where <parameter-list> of n variables is declared as data type and variable name separated by a comma:

 <data-type> var1, <data-type> var2, ... <data-type> varN

[edit]

Example

Example of a program that adds two integers and prints: 1 + 1 = 2

 #include <stdio.h>
 
 int add(int x, int y)
 {
    return x + y;
 }
 
 int main(void)
 {
    int foo = 1, bar = 1;
    printf("%d + %d = %d\n", foo, bar, add(foo, bar));
    return 0;
 }

The function main must be declared as having an int return type according to the C standard. It returns to its caller, typically the underlying operating system.

[edit]

Description

After preprocessing, at the highest level a C program consists of a sequence of declarations at file scope.

The declarations introduce functions, variables and types. C functions are akin to the subroutines of Fortran or the procedures of Pascal.

A definition is a special type of declaration. A variable definition sets aside storage and possibly initializes it, a function definition provides its body.

An implementation of C providing all of the standard library functions is called a hosted implementation. Programs written for hosted implementations are required to define a special function called main, which is the first function called when execution of the program begins. Here is a minimal C program:

 int main(void)
 {
    return 0;
 }

The main function will usually call other functions to help it perform its job.

Some implementations are not hosted, usually because they are not intended to be used with an operating system. Such implementations are called free-standing in the C standard. A free-standing implementation is free to specify how it handles program startup; in particular it need not require a program to define a main function.

Functions may be written by the programmer or provided by existing libraries. The latter are declared by including header files—with the #include preprocessing directive—and then linked into the final executable image. Certain library functions, such as printf, are defined by the C standard; these are referred to as the standard library functions.

A function may return a value to the environment that called it. This is usually another C function, however the calling environment of the main function is the higher-level process in Unix-like systems or the operating system itself in other cases. By definition, the return value zero of main signifies successful completion when the program terminates. The printf function mentioned above returns how many characters were printed, but this value is usually ignored.

[edit]

Passing variables

Variables in C are passed by value while other languages may pass by reference. This means that the receiving function gets copies of the values and have no way of altering the original variables. To have a function alter variables passed from another function, such as main, you pass the address (called a 'pointer') to it and dereference it in the receiving function (see Reference types for more info):

 void incInt(int *y)
 {
    (*y)++;  // Increase the value of 'x', in main, by one
 }

 int main(void)
 {
    int x = 0;
    incInt(&x);  // pass a reference to the var 'x'
    return 0;
 }

In order to pass an editable pointer to a function you have to pass a reference to the pointer; its address:

 void setInt(int **p, int n)
 {
    *p = (int *) malloc(sizeof(int));    // allocate a memory area, using the pointer given as
                                         // as a parameter
    
    // dereference the given pointer that has been assigned an address
    // of malloc'd memory and set the int to the value of n (42)    
    **p = n;                             
 }

 int main(void)
 {
    int *p;           // create a pointer to an integer
    setInt(&p, 42);   // pass the address of 'p'
    return 0;
 }

int **p defines a pointer to a pointer, which is the address to the pointer p in this case.

The function scanf works the same way:

 int x;
 scanf("%d", &x);

[edit]

Input/Output

In C, input and output are performed via a group of functions in the standard library. In ANSI/ISO C, those functions are defined in the <stdio.h> header.

[edit]

Standard I/O

Three standard I/O streams are predefined:

stdin standard input
stdout standard output
stderr standard error

These streams are automatically opened and closed by the runtime environment, they need not and should not be opened explicitly.

The following example demonstrates how a filter program is typically structured:


#include <stdio.h>

int main(void)
{
   int c;

   while ((c = getchar()) != EOF ) 
   {
         /* do various things
            to the characters */

          if (anErrorOccurs) {
              fputs("An error occured\n", stderr);
              break;
          }

         /* ... */
         putchar(c);
         /* ... */

    }
    return 0;
}

[edit]

File I/O

fopen
fclose

[edit]

Miscellaneous

[edit]

Case sensitivity

C is case sensitive.

[edit]

Comments

Text starting with /* is treated as a comment and ignored. The comment ends at the next */ and can span multiple lines. Accidental omission of the comment terminator is problematic in that the next comment's properly constructed comment terminator will be used to terminate the initial comment, and all code in between the comments will be considered as a comment.

The C99 standard introduced [[C++]] style line comments. These start with // and extend to the end of the line.

 // this line will be ignored by the compiler

 /* these lines
    will be ignored
    by the compiler */

[edit]

Command-line arguments

The parameters given on a command line are passed to a C program with two predefined variables - the count of the command-line arguments in argc and the individual arguments as character arrays in the pointer array argv. So the command

 myFilt p1 p2 p3

results in something like

Image:CCommandLineArgv.png

(Note: While individual strings are contiguous, there is no guarantee that the strings are stored as a contiguous group.)

The individual values of the parameters may be accessed with argv[1], argv[2], and argv[3], as shown in the following program:

#include <stdio.h>

int main(int argc, char *argv[])
{
  int i;
  printf ("argc\t= %i\n", argc);
  for (i = 0; i < argc; i++)
    printf ("argv[%i]\t= %s\n", i, argv[i]);
  return 0;
}

[edit]

Evaluation order

A conforming C compiler can evaluate expressions in any order between sequence points. Sequence points are defined by:

Statement ends at semicolons.
The sequencing operator: a comma.
The short-circuit operators: logical and (&&) and logical or (||).
The conditional operator (?:): This operator evaluates its first sub-expression first, and then its second or third (never both of them) based on the value of the first.

Expressions before a sequence point are always evaluated before those after a sequence point. In the case of short-circuit evaluation, the second expression may not be evaluated depending on the result of the first expression. For example, in the expression (a() || b()), if the first argument evaluates to true, the result of the entire expression will also be true, so b() is not evaluated.

[edit]

Undefined behavior

An interesting (though certainly not unique) aspect of the C standards is that the behavior of certain code is said to be "undefined". In practice, this means that the program produced from this code can do anything, from working as intended, to crashing every time it is run.

For example, the following code produces undefined behavior, because the variable b is operated on more than once in the expression a = b++ + b++;:

#include <stdio.h>

int main(void)
{
  int a, b = 1;
  a = b++ + b++;
  printf("%d\n", a);
  return 0;
}

Because there is no sequence point between the access of b in b++ + b++, it is possible to resolve the statement in more than one order, resulting in an ambiguous statement. However, to allow the compiler to make certain optimizations the standard is even more pessimistic than this. In general, any separate modification and access of a value between sequence points invokes undefined behavior.

[edit]

References

Kernighan, Brian W. and Dennis M. Ritchie. The C Programming Language.

[edit]

External links

Retrieved from "http://www.netipedia.com/index.php/C_syntax"

Categories: C programming language

C syntax

From Free net encyclopedia

Contents

Data structures

Primitive data types

Constants that define boundaries of primitive data types

Typical boundaries of primitive integral types

References

Referencing

Dereferencing

Equivalent reference and primitive statements

Arrays

Static array declaration

Accessing elements

Dynamic arrays

Multidimensional arrays

Strings

Library functions

Operators

Control structures

Compound statements

Expression statements

Selection statements

Iteration statements

Jump statements

Functions

Syntax

Example

Description

Passing variables

Input/Output

Standard I/O

File I/O

Miscellaneous

Case sensitivity

Comments

Command-line arguments

Evaluation order

Undefined behavior

See also

References

External links

Views

Personal tools

Search

Partner sites