Revision as of 23:35, 6 June 2008 editVquex (talk | contribs)190 edits Nominate for deletion per Misplaced Pages:Criticism← Previous edit | Revision as of 20:00, 11 June 2008 edit undoShereth (talk | contribs)Extended confirmed users10,865 edits per Misplaced Pages:Articles for deletion/Criticism of the C programming languageNext edit → | ||
Line 1: | Line 1: | ||
#REDIRECT ] | |||
<!-- Please do not remove or change this AfD message until the issue is settled --> | |||
{{AfDM|page=Criticism of the C programming language|date=2008 June 6|substed=yes}} | |||
<!-- For administrator use only: {{oldafdfull|page=Criticism of the C programming language|date=6 June 2008|result='''keep'''}} --> | |||
<!-- End of AfD message, feel free to edit beyond this point --> | |||
{{Refimprove|date=December 2007}} | |||
'''Criticism of the C programming language''' refers to critical commentary directed at the ]. This widely used language first appeared on ] but, like ], soon found its way to ] and ] based ], where it achieved a rapid acceptance in the industry. Despite (or due to) its popularity, C's characteristics have led to much criticism. | |||
==Minimalist design== | |||
A popular saying, repeated by such notable language designers as ], is that "C makes it easy to shoot yourself in the foot."<ref>http://www.research.att.com/~bs/bs_faq.html#really-say-that Stroustrup: FAQ</ref> In other words, C permits some operations that are sometimes not desirable, and thus many simple programming errors are not detected by the compiler and may not be readily apparent at runtime. If sufficient care and discipline are not used in programming and maintenance, this may lead to programs with unpredictable behavior and security holes. (Although this is not unique to C, C provides less protection than do many other programming languages.) | |||
The designers wanted to avoid compile- and ] checks that were too expensive when C was first implemented. With time, external tools were developed to perform some of these checks. Nothing prevents an implementation from providing such checks, but nothing requires it, either. | |||
In their response to criticism of C not being a strongly-typed language, Kernighan and Ritchie made reference to the basic ] of C: "Nevertheless, C retains the basic philosophy that programmers know what they are doing; it only requires that they state their intentions explicitly."<ref>{{cite web | author=Dennis Ritchie | url=http://cm.bell-labs.com/cm/cs/who/dmr/chist.html | title=The Development of the C Language | accessdate=2006-07-26}}</ref><ref>Brian W. Kernighan and Dennis M. Ritchie: ''The C Programming Language,'' 2<sup>nd</sup> ed., ], 1988, p. 3.</ref> | |||
==Absent features== | |||
C was designed to be a small, simple language, which has contributed significantly to its acceptance, as new C ] can be developed quickly for new platforms. The relatively low-level nature of the language affords the programmer close control over what the computer does, while allowing specially tailoring and aggressive optimization for a particular platform. This allows the code to run efficiently on very limited hardware, such as ]. | |||
C does not have some features that are available in some other programming languages: | |||
* No assignment of arrays or strings (copying can be done via standard functions; assignment of objects having <code>struct</code> or <code>union</code> type is supported) | |||
* No ] | |||
* No requirement for ] of arrays | |||
* No ] | |||
* No syntax for ], such as the <code>A..B</code> notation used in several languages | |||
* No separate ] type: zero/nonzero is used instead<ref>The 1999 revision of the C standard added a type <code>_Bool</code>, but it was not retrofit into the language's existing Boolean contexts.</ref> | |||
* No ] definitions | |||
* No formal ]s or functions as parameters (only function and variable pointers) | |||
* No ]s or ]s; intra-thread control flow consists of nested function calls, except for the use of the ] or ] library functions | |||
* No ]; standard library functions signify error conditions with the global <code>]</code> variable and/or special return values | |||
* Only rudimentary support for modular programming | |||
* No compile-time polymorphism in the form of ] or ] ] | |||
* Only rudimentary support for ] | |||
* Very limited support for ] with regard to ] and ] | |||
* Limited support for ] | |||
* No native support for ] and ] | |||
* No standard libraries for ] and several other application programming needs | |||
A number of these features are available as extensions in some compilers, or can be supplied by third-party libraries, or can be simulated by adopting certain coding disciplines. For example, in most object-oriented languages, method functions include a special "this" pointer which refers to the current object. By passing this pointer as an explicit function argument, similar functionality can be achieved in C. Whereas in C++ one might write: | |||
stack.push(val); | |||
one would write in C: | |||
push(&stack,val); | |||
==Undefined behaviour== | |||
Many operations in C that have ] are not required to be diagnosed at ]. In the case of C, "undefined behavior" means that the exact behavior which arises is not specified by the standard, and exactly what will happen does not have to be documented by the C implementation. A famous, although misleading, expression in the ]s and is that the program could cause "]".<ref>{{cite web | url=http://www.catb.org/jargon/html/N/nasal-demons.html | title=Jargon File entry for ''nasal demons'' }}</ref> Sometimes in practice what happens for an instance of undefined behavior is a ] that is hard to track down and which may corrupt the contents of memory. Sometimes a particular compiler generates well-behaved actions that are not the same as would be obtained using a different C compiler. The reason some behavior has been left undefined is to allow the compiler to generate more efficient executable code for well-defined behavior, which was deemed important for C's primary role as a systems implementation language; it is the programmer's responsibility to avoid undefined behavior. Examples of undefined behavior are: | |||
* accessing outside the bounds of an array | |||
* overflowing a signed integer | |||
* reaching the end of a function without finding a return statement, when the return value is used | |||
* reading the value of a variable before initializing it | |||
These operations are all programming errors that could occur using many programming languages; C draws criticism because its standard explicitly identifies numerous cases of undefined behavior, including some where the behavior could have been made well defined, and does not specify any run-time error handling mechanism. | |||
Invoking <code>]()</code> on a stream opened for input is an example of a different kind of undefined behavior, not necessarily a programming error but a case for which some conforming implementations may provide well-defined, useful semantics (in this example, presumably discarding input through the next new-line) as an allowed ''extension''. | |||
==Memory allocation== | |||
Automatically and dynamically allocated objects are not necessarily initialized; they initially have indeterminate values (typically, whatever ] happens to be present in the ], which might not even represent a valid value for that type). If the program attempts to use such an uninitialized value, the results are undefined. Many modern compilers try to detect and warn about this problem, but both ] occur. | |||
Another common problem is that heap memory has to be manually synchronized with its actual usage in any program for it to be reused as much as possible. For example, if the only pointer to a memory allocation goes out of scope or has its value overwritten before <code>]</code> has been called, then that memory cannot be recovered for later reuse and is essentially lost to the program, a phenomenon known as a ''].'' Conversely, it is possible to release memory too soon and continue to access it; however, since the allocation system can re-allocate or itself use the freed memory, unpredictable behavior is likely to occur when the multiple users corrupt each other's data. Typically, the symptoms will appear in a portion of the program far removed from the actual error. Such issues are ameliorated in languages with ] or ]. | |||
==Pointers== | |||
Pointers are a primary source of potential danger. Because they are typically unchecked, a pointer can be made to point to any arbitrary location, causing undesirable effects. Although properly-used pointers point to safe places, they can be moved to unsafe places using invalid ]; the memory they point to may be deallocated and reused (]s); they may be uninitialized (]s); or they may be directly assigned a value using a cast, union, or through another corrupt pointer. In general, C is permissive in allowing manipulation of and conversion between pointer types, although compilers typically provide options for various levels of checking. Other languages address these problems by using more restrictive ] types. | |||
==Arrays== | |||
Although C supports static arrays, it is not required that array indices be validated (]). For example, one can try to write to the sixth element of an array with five elements, yielding generally undesirable results. This type of bug, called a ''],'' has been notorious as the source of a number of security problems. On the other hand, since ] technology was largely nonexistent when C was defined, bounds checking came with a severe performance penalty, particularly in numerical computation. A few years earlier, some ] compilers had a switch to toggle bounds checking on or off; however, this would have been much less useful for C, where array arguments are passed as simple pointers. | |||
Multidimensional arrays are commonly used in numerical algorithms (mainly from applied ]) to store matrices. The structure of the C array is particularly well suited to this particular task. However, since arrays are passed merely as pointers, the bounds of the array must be known fixed values or else explicitly passed to any subroutine that requires them, and dynamically sized arrays of arrays cannot be accessed using double indexing. (A workaround for this is to allocate the array with an additional "row vector" of pointers to the columns.) These issues are discussed in the book '']'', chapter 1.2, page 20''ff''.<ref>http://www.nrbook.com/a/bookcpdf/c1-2.pdf</ref> | |||
C99 introduced "variable-length arrays" which address some, but not all, of the issues with ordinary C arrays. | |||
==Variadic functions== | |||
Another potential source of bugs is ]s, which take a variable number of arguments. Unlike other prototyped C functions, checking the types of arguments to variadic functions at ] is, in general, impossible without additional information. If the wrong type of data is passed, the effect is unpredictable, and often fatal. Variadic functions also handle null pointer constants in a way which is often surprising to those unfamiliar with the language semantics. For example, NULL must be cast to the desired pointer type when passed to a variadic function. The <code>]</code> family of functions supplied by the standard library, used to generate ] output, has been noted for its error-prone variadic interface, which relies on a format string to specify the number and types of trailing arguments. | |||
However, ] of variadic functions from the standard library is a quality-of-implementation issue; many modern compilers do type-check calls to functions in the <code>printf</code> family, producing warnings if the argument list is inconsistent with the format string. Even so, not all <code>printf</code> calls can be checked statically since the format string can be built at runtime, and other variadic functions typically remain unchecked. | |||
==Syntax== | |||
Although mimicked by many languages because of its widespread familiarity, C's syntax has often been criticized. For example, Kernighan and Ritchie say in the second edition of ''The C Programming Language'', "C, like any other language, has its blemishes. Some of the operators have the wrong precedence; some parts of the syntax could be better." | |||
Some specific problems worth noting are: | |||
* Not checking number and types of arguments when the function declaration has an empty parameter list. (This provides ] with ], which lacked prototypes.) | |||
* Some questionable choices of operator precedence, as mentioned by Kernighan and Ritchie above, such as <code>==</code> binding more tightly than <code>&</code> and <code>|</code> in expressions like <code>x & 1 == 0</code>. | |||
* The use of the <code>=</code> operator, used in mathematics for equality, to indicate assignment, following the precedent of ], ], and ], but unlike ] and its derivatives. Ritchie made this syntax design decision consciously, based primarily on the argument that assignment occurs more often than comparison. | |||
* Similarity of the assignment and equality operators (<code>=</code> and <code>==</code>), making it easy to substitute one for the other. C's weak type system permits each to be used in the context of the other without a compilation error (although some compilers produce warnings). For example, the conditional expression in <code>if (a=b)</code> is only true if <code>a</code> is not zero after the assignment.<ref>http://www.cs.ucr.edu/~nxiao/cs10/errors.htm 10 Common Programming Mistakes in C</ref> | |||
* A lack of ] operators for complex objects, particularly for string operations, making programs which rely heavily on these operations difficult to read. | |||
* A declaration syntax that some find unintuitive, particularly for ]s. (Ritchie's idea was to declare identifiers in contexts resembling their use: "]".) | |||
==Economy of expression== | |||
One occasional criticism of C is that it can be concise to the point of being cryptic. A classic example that appears in K&R<ref>Brian W. Kernighan and Dennis M. Ritchie: ''The C Programming Language,'' 2<sup>nd</sup> ed., p. 106.</ref> is the following function to copy the contents of string <code>t</code> to string <code>s</code>: | |||
void strcpy(char *s, char *t) | |||
{ | |||
while (*s++ = *t++); | |||
} | |||
In this example, <code>t</code> points to the first of a null-terminated array of characters, and <code>s</code> points to the first of an array of characters to be written. Each iteration of the <code>while</code> statement does the following: | |||
* Copies the character pointed to by <code>t</code> (initially set to point to the first character of the string to be copied) to the corresponding character position pointed to by <code>s</code> (initially set to point to the first character of the character array to be copied to) | |||
* Advances the pointers <code>s</code> and <code>t</code> to point to the next character. Note that the values of <code>s</code> and <code>t</code> can safely be changed, because they are local ''copies'' of the pointers to the corresponding arrays | |||
* Tests whether the character copied (the result of the assignment statement) is a ] signifying the end of the string. Note that the test could have been written "<code>((*s++ = *t++) != '\0')</code>" (where <code>'\0'</code> is the null character); however, in C, a Boolean test is actually a test for any non-zero value; consequently the test is true as long as the character is any character other than a string-terminating null | |||
* As long as the character is not a null, the condition is true, causing the <code>while</code> loop to repeat. (In particular, because the character copy occurs before the condition is evaluated, the final terminating null is guaranteed to be copied as well) | |||
* The repeatedly executed body of the <code>while</code> loop is an empty statement, signified by the semicolon (which despite appearances is not part of the <code>while</code> syntax). (It is not uncommon for the body of <code>while</code> or <code>for</code> loops to be empty.) | |||
In more verbose languages such as ], a similar iteration would require several statements. The above code is functionally equivalent to: | |||
void strcpy(char *s, char *t) | |||
{ | |||
char aux; | |||
do { | |||
*s = *t; | |||
aux = *s; | |||
s++; | |||
t++; | |||
} while (aux != '\0'); | |||
} | |||
In a modern optimising compiler, these two pieces of source code produce identical machine instruction sequences, so the smaller code does not produce smaller output. For C programmers, the economy of style is idiomatic and leads to shorter expressions; for critics, being able to do too much with a single line of C code can lead to problems in comprehension. | |||
==Internal consistency== | |||
Some features of C, its preprocessor, and/or implementation are inconsistent. One of C's features is three distinct classes of non-wide string literals. One is for run-time data, another is for <code>#include</code> files with quotation marks around the filename, and the third is for <code>#include</code> filenames in angle brackets. The allowed symbol set, and its interpretation, is not consistent among the three. To some extent this arose from the need to accommodate a wide variety of file naming conventions, such as ]'s use of backslash as a path separator. | |||
Another consistency problem stems from shortcomings in C's preprocessor, which was originally implemented as a separate, relatively simple process only loosely connected with the semantics of the rest of the language. The following code is not legal Standard C: | |||
int sixteen = 0x3e-0x2e; | |||
The reason is that <code>0x3e-0x2e</code> matches the form of a "preprocessing number" ("<code>e-</code>" could be part of a number in ]), and, since token-matching is greedy, is converted to a single preprocessing token. The subsequent conversion of that to a token in a later phase of translation is ill-defined, so the compiler will not obtain the intended tokenization of | |||
int sixteen = 0x3e - 0x2e ; | |||
even though spaces around the minus sign would not otherwise be required. | |||
==Standardization== | |||
The C programming language was standardized by ] in 1989 and adopted as an ] standard in 1990; the standard has subsequently been extended twice. Some features of the C standard, such as ] and ], have been challenged on the ground of questionable user demand. Some major C compilers have not yet become fully conformant to later versions of the C standard. | |||
The C standards have been accompanied by Rationale documents which describe the considerations behind many of the choices made by the standards committee. Frequently there were trade-offs among competing requirements, and not everybody weighs the factors the same as did the C standards committee. | |||
As well, more than most other language standards, the C standard leaves some behavior unspecified, such as the order of evaluation of arguments to a function, to allow compilers to have them evaluated in whatever way they believe will be optimal for their target platforms. This can result in code fragments which behave differently when compiled by different compilers, by different versions of the same compiler, or on different architectures; | |||
these can be avoided by careful programming. | |||
==Maintenance== | |||
There are other problems in C that don't directly result in bugs or errors, but make it harder for programmers to build a robust, maintainable, large-scale system. Examples of these include: | |||
* A fragile system for importing definitions (<code>#include</code>) that relies on literal text inclusion and redundantly keeping prototypes and function definitions in sync. | |||
* A cumbersome compilation model that complicates dependency tracking and ]s between modules. | |||
* A weak type system that lets many erroneous programs compile without diagnostic messages. | |||
==Tools for mitigating issues with C== | |||
There are many C programmers who have learned to cope with C's quirks. However, some programmers may wish to use tools that have been created to help them overcome such problems. | |||
Automated ] checking and auditing are beneficial in any language, and for C many such tools exist, such as ]. A common practice is to use Lint to detect questionable code when a program is first written. Once a program passes Lint, it is then compiled using the C compiler. Also, many compilers can optionally warn about syntactically valid constructs that are likely to actually be errors. | |||
There are also compilers, libraries and ] level mechanisms for performing array bounds checking, ] detection, and ], that are not a standard part of C. | |||
There are dialects of C, such as ] and ], that address some of these concerns. | |||
Many compilers, notably ] and ], reduce the long compilation times caused by very large ]s by using '']s'', a system where the contents of a header are stored in an form designed to be much quicker to process than source text. The one-time cost of building a precompiled header file is offset by the savings from multiple uses of the faster version. | |||
It should be recognized that these tools are not a ]. Because of C's flexibility, some types of errors involving misuse of variadic functions, out-of-bounds array indexing, and incorrect ] cannot be detected on some architectures without incurring a significant performance penalty. However, some common cases can be recognized and accounted for. | |||
== See also == | |||
* ] | |||
* ] | |||
== References == | |||
{{reflist}} | |||
{{CProLang}} | |||
] | |||
] |
Revision as of 20:00, 11 June 2008
Redirect to: