tag:blogger.com,1999:blog-67070053815625014362024-03-13T06:56:03.714-07:00The Flat Trantor SocietyRants and musings on softwareKeith Thompsonhttp://www.blogger.com/profile/06676710315024892313noreply@blogger.comBlogger5125tag:blogger.com,1999:blog-6707005381562501436.post-71122380327124680232014-02-09T15:15:00.002-08:002014-03-19T14:42:58.794-07:00C standard quibbles<!-- Title: C standard quibbles -->
<!-- URL: http://the-flat-trantor-society.blogspot.com/2014/02/c-standard-quibbles.html -->
<p>This article is a collection of personal quibbles regarding the ISO
C standard. Expect it to be updated sporadically.</p>
<p>There have been three major editions of the ISO C standard:</p>
<ul>
<li><strong>C89/C90:</strong> The original ISO C standard was published in 1990, and was closely
based on the 1989 ANSI C standard. The 1989 ANSI and 1990 ISO
standards describe exactly the same language; ISO added some
introductory material and renumbered the sections.
<a href="http://www.bsb.me.uk/ansi-c/ansi-c">This web page</a> appears to be
a draft of the ANSI version of the standard. The 1995 amendment
added digraphs and wide character support.</li>
<li><strong>C99:</strong> The second version was published in 1999.
<a href="http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1256.pdf">N1256</a>
includes the full 1999 standard with the three Technical Corrigenda
merged into it.</li>
<li><strong>C11</strong>: The third version was published in 2011. The
<a href="http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf">N1570</a>
draft is freely available, and is very nearly identical to the
released standard. (There has been one minor Technical Corrigendum.)</li>
</ul>
<!-- This does not yet reflect new content -->
<p>Subtopics (these links work on
<a href="https://github.com/Keith-S-Thompson/the-flat-trantor-society/blob/master/005-c-standard-quibbles.md]
but not on the blog">my GitHub page</a>. <strong>TODO:</strong> Figure out how to fix that or give up and delete the links):</p>
<ul>
<li><a href="#is-int-main-necessarily-valid-should-it-be">Is <code>int main()</code> necessarily valid? Should it be?</a></li>
<li><a href="#what-is-an-lvalue">What is an lvalue?</a></li>
<li><a href="#what-is-an-expression">What is an expression?</a></li>
<li><a href="#infinite-loops">Infinite loops</a></li>
<li><a href="#fgetc-when-sizeof-int--1"><code>fgetc()</code> when <code>sizeof (int) == 1</code></a></li>
<li><a href="#more-stuff-">More stuff ...</a></li>
</ul>
<a name='more'></a>
<h3>Is <code>int main()</code> necessarily valid? Should it be?</h3>
<h5>ISO C 5.1.2.2.1 Program startup</h5>
<p>5.1.2.2.1 defines two permitted definitions for <code>main</code>:</p>
<ul>
<li><code>int main(void) { /* ... */}</code></li>
<li><code>int main(int argc, char *argv[]) { /* ... */ }</code></li>
</ul>
<p>followed by:</p>
<blockquote>
<p>or equivalent; or in some other implementation-defined manner.</p>
</blockquote>
<p>Which means that compilers <em>may</em> accept <code>void main(void)</code>, but are
not required to do so (more on that later and elsewhere).</p>
<p>This is a very commonly used definition:</p>
<ul>
<li><code>int main() { /* ... */ }</code></li>
</ul>
<p>As a <em>definition</em>, it says that <code>main</code> has no parameters. As a
<em>declaration</em>, though, it <em>doesn't</em> say that <code>main</code> takes no arguments;
rather, it says that <code>main</code> takes an unspecified but fixed number
and type(s) of parameters -- and if you call it with arguments that
are incompatible with the definition, the behavior is undefined.</p>
<p>I argue that <code>int main()</code> is <em>not</em> equivalent to <code>int main(void)</code>,
and therefore is not a valid definition <em>unless</em> it's covered by the
"or in some other implementation-defined manner" clause (i.e., unless
the implementation explicitly documents that it supports it).</p>
<p><code>int main() { /* ... */ }</code> is an old-style non-prototype definition.
Support for such definitions is obsolescent feature (C11 6.11.7).</p>
<p>Furthermore, this program:</p>
<pre><code>int main(void) {
if (0) {
main(42);
}
}
</code></pre>
<p>violates a constraint, whereas this program:</p>
<pre><code>int main() {
if (0) {
main(42);
}
}
</code></pre>
<p>does not, which implies that the two forms are <em>not</em> equivalent.</p>
<p>I wonder whether those who argue that <code>int main()</code> is valid because
it's "equivalent" to <code>int main(void)</code> would make the same argument for:</p>
<pre><code>int main(argc, argv)
int argc;
char *argv[];
{
/* ... */
}
</code></pre>
<p>On the other hand, as long as non-prototype function declarations
and definitions are part of the standard, <code>int main() { /* ... */ }</code>
probably <em>should</em> be valid. The entire point of continuing to support
such definitions and declarations is to avoid breaking pre-ANSI code,
written before prototypes were added to the language (it's not as
if non-prototype declarations are useful other than for backward
compatibility). If <code>int main()</code> is invalid, then <em>no</em> pre-ANSI program
is a valid C90, C99, or C11 program, which was surely not the intent.</p>
<h3>What is an lvalue?</h3>
<h5>ISO C 6.2.2.1p1 Lvalues, arrays, and function designators</h5>
<p>The definition of the term <em>lvalue</em> (sometimes written <em>l-value</em>)
has changed several times over the years. The "L" part of the
name was originally an abbreviation of the word "left"; an <em>lvalue</em>
can appear on the left hand side of an assignment, and an <em>rvalue</em>
can appear on the right hand side.</p>
<p>My (somewhat vague) recollection is that the term "l-value"
originally referred to a kind of <em>value</em>, not (as it does now
in C) to a kind of <em>expression</em>. Given that <code>n</code> is an integer
variable, the expression <code>n</code> could be "evaluated for its l-value"
(which identifies the object that it designates, ignoring any value
stored in that object), or it could be "evaluated for its r-value"
(which retrieves the value stored in the object). The expression
<code>n</code> would be evaluated for its l-value if it appeared on the left
side of an assignment, or for its r-value in most other contexts.
Apparently the terms "l-value" and "r-value" originated in
<a href="http://en.wikipedia.org/wiki/CPL_(programming_language">CPL</a>),
the ancestor of BCPL, which led to B, which led to C.</p>
<p>Note carefully that, under this definition, an "l-value" is <em>not</em> a
pointer value. An "l-value" was the <em>identity</em> of an object, not its
address. (Evaluating an expression for its l-value might well involve
computing an address internally.)</p>
<p>I've tried and failed to find a reference for these definitions,
but a footnote in section 6.3.2.1 of the C standard:</p>
<pre><code>The name "lvalue" comes originally from the assignment expression
**`E1 = E2`**, in which the left operand **`E1`** is required
to be a (modifiable) lvalue. It is perhaps better considered as
representing an object "locator value". What is sometimes called
"rvalue" is in this International Standard described as the
"value of an expression".
</code></pre>
<p>at least strongly suggests that an <em>rvalue</em> is a value, not an
expression that yields a value -- though an lvalue is a kind of
expression. The term "rvalue" does not appear anywhere else in the
C standard.</p>
<p>But that's all pre-C history.</p>
<ul>
<li><p>Kernigan & Ritchie, "The C Programming Language", 1st edition, 1978:</p>
<blockquote>
<p>An <em>object</em> is a manipulatable region of storage; an <em>lvalue</em>
is an expression referring to an object.</p>
</blockquote>
<p>This suffers from the same problem as the later ISO C90 definition;
see below.</p></li>
<li><p>C90 6.2.2.1:</p>
<blockquote>
<p>An <em>lvalue</em> is an expression (with an object type or an
incomplete type other than <strong><code>void</code></strong>) that designates an object.</p>
</blockquote>
<p>Problem: Though this conveys the intent, it implies that a
dereferenced null pointer is <em>not</em> an lvalue, which makes lvalue-ness
an execution time property. This is clearly not the intent.</p></li>
<li><p>C99 6.3.2.1p1:</p>
<blockquote>
<p>An <em>lvalue</em> is an expression with an object type or an incomplete
type other than <strong><code>void</code></strong>; if an lvalue does not designate an
object when it is evaluated, the behavior is undefined.</p>
</blockquote>
<p>Problem: This says that any expression of an appropriate type is an
lvalue, which is certainly not the intent. For example, it says that
<code>42</code> (which is an expression of an object type) is an value, and that
since it doesn't designate an object, the behavior of any program
containing <code>42</code> is undefined. What a mess. The intent is that an
the evaluation of an lvalue <em>that's in a context that requires an
lvalue</em> has undefined behavior if it doesn't designate an object.</p></li>
<li><p>C11 6.3.2.1p1:</p>
<blockquote>
<p>An <em>lvalue</em> is an expression (with an object type other than
<strong><code>void</code></strong>) that potentially designates an object; if an lvalue
does not designate an object when it is evaluated, the behavior
is undefined.</p>
</blockquote>
<p>This goes back to the C90 definition and adds the word "potentially"
(<a href="https://groups.google.com/forum/message/raw?msg=comp.std.c/KXjLg24jzVU/sAQRa0kjbE4J">which was my idea</a>,
BTW). This clarifies that a dereferenced null pointer is an lvalue,
but if it's evaluated in a context that requires an lvalue it has
undefined behavior.</p>
<p>I'm still not entirely happy with this, because it's not clear just
what "potentially designates" means.</p></li>
</ul>
<p>Ultimately, I think that the term <em>lvalue</em> can be defined
<em>syntactically</em>. I think you can go through section 6.5 of the
standard and determine that certain kinds of expressions are always
lvalues, other kinds of expressions never are, and others are an
lvalue or not based on criteria that are easy to specify.</p>
<p>An expression is an <em>lvalue</em> if and only if it one of the following:</p>
<ul>
<li>An identifier that is not a function name or enumeration constant;</li>
<li>A string literal;</li>
<li>A parenthesized expression, if and only if the unparenthesized
expression is an lvalue;</li>
<li>An indirection expression <code>*x</code>;</li>
<li>A subscript expression (<code>x[y]</code>) (this follows from the definition of
the subscript operator and the fact that <code>*x</code> is an lvalue)</li>
<li>A reference to a struct or union member (<code>x.y</code>, <code>x->y</code>); or</li>
<li>A compound literal.</li>
</ul>
<p>(I don't guarantee this is 100% correct.)</p>
<p>The standard's definition of <em>lvalue</em> should, IMHO, use a list similar
to the above. The description of the <em>intent</em> can still use the
wording of the current definition, perhaps as a footnote.</p>
<h3>Null pointer constants and parenthesized expressions</h3>
<h5>ISO C 6.3.2.3 Pointers (under 6.3 Conversions)</h5>
<p>Paragraph 3:</p>
<pre><code>An integer constant expression with the value 0, or such an
expression cast to type **`void *`**, is called a *null pointer
constant*.
</code></pre>
<p>The problem: 6.5.1 (Primary expressions) says that a parenthesized
expression</p>
<pre><code>is an lvalue, a function designator, or a void expression if
the unparenthesized expression is, respectively, an lvalue,
a function designator, or a void expression.
</code></pre>
<p>It <em>doesn't</em> say that a parenthesized null pointer constant is a null
pointer constant.</p>
<p>Which implies, strictly speaking, that <code>(void*)0</code> is a null pointer constant,
but <code>((void*)0)</code> is not.</p>
<p>And since 7.1.2 "Standard headers" requires:</p>
<pre><code>Any definition of an object-like macro described in this clause shall
expand to code that is fully protected by parentheses where necessary,
so that it groups in an arbitrary expression as if it were a single
identifier.
</code></pre>
<p>this implies that the <code>NULL</code> macro may not be defined as <code>(void*)0</code>,
since, for example, that would cause <code>sizeof NULL</code> to be a syntax
error.</p>
<p>I'm sure that most C implementations do treat a parenthesized null
pointer constant as a null pointer constant, and define <code>NULL</code> either
as <code>0</code>, <code>((void*)0)</code>, or in some other manner.</p>
<h3>What is an expression?</h3>
<h5>ISO C 6.5 Expressions</h5>
<p>The syntax and semantics of expressions are described in section
6.5 of the ISO C standard (which covers 30 pages). But the formal
<em>definition</em> of the word "expression" is in 6.5p1:</p>
<blockquote>
<p>An <em>expression</em> is a sequence of operators and operands that
specifies computation of a value, or that designates an object
or a function, or that generates side effects, or that performs
a combination thereof.</p>
</blockquote>
<p>That sounds reasonable -- except that a strict reading of that
definition implies that <code>42</code> is not an expression. Why not?
It contains no operators, and <code>42</code> can't be an operand if there is
no operator, so it's not "a sequence of operators and operands".</p>
<p>The real definition of <em>expression</em> is syntactic; anything that
satisfies the syntactic definition of <em>expression</em> (in 6.5.17, and
referring to definitions in the rest of section 6.5) is an expression.</p>
<p>The definition in 6.5p1 either needs to be re-worded so that it
includes primary expressions, or it needs to refer to the grammar.
A more reader-friendly (but perhaps less precise) English description
of what an expression is should still be included.</p>
<h3>Integer constant expressions</h3>
<h5>ISO C 6.6.6 Constant expressions, paragraph 6</h5>
<p>Credit for this goes to <a href="http://www.stackoverflow.com">Stack Overflow</a> user
<a href="http://stackoverflow.com/users/2698605/pablo1977">pablo1977</a> who posted
<a href="http://stackoverflow.com/q/21972815/827263">this question</a>.</p>
<p>6.6.6p6 says:</p>
<blockquote>
<p>An <em>integer constant expression</em> shall have integer type and
shall only have operands that are integer constants, enumeration
constants, character constants, <strong><code>sizeof</code></strong> expressions whose
results are integer constants, <strong><code>_Alignof</code></strong> expressions, and
floating constants that are the immediate operands of casts. Cast
operators in an integer constant expression shall only convert
arithmetic types to integer types, except as part of an operand
to the <strong><code>sizeof</code></strong> or <strong><code>_Alignof</code></strong> operator.</p>
</blockquote>
<p>The problem: There's no indication that a parenthesized constant is a
constant. So <code>(int)3.14</code> is a constant expression, but <code>(int)(3.14)</code>,
strictly speaking, is not, because <code>3.14</code> is a floating constant but
<code>(3.14)</code> is not.</p>
<p>It seems obvious that if <code>(int)3.14</code> is an integer constant expression,
then there's no reason that <code>(int)(3.14)</code> shouldn't be one as well,
and though I haven't checked I suspect that all existing compilers
treat it as one. If the wording of the standard is to be corrected,
some care will have to be taken so that both <code>(int)(3.14)</code> and
<code>(int)((3.14))</code> are integer constant expressions</p>
<h3>Infinite loops</h3>
<h5>ISO C 6.8.5 Iteration statements, paragraph 6</h5>
<p>This was a change made in ISO C 2011.</p>
<p>6.8.5p6 says:</p>
<blockquote>
<p>An iteration statement whose controlling expression is not a constant
expression, that performs no input/output operations, does not access
volatile objects, and performs no synchronization or atomic operations
in its body, controlling expression, or (in the case of a <strong><code>for</code></strong>
statement) its <em>expression-3</em>, may be assumed by the implementation
to terminate.</p>
</blockquote>
<p>with a footnote:</p>
<blockquote>
<p>This is intended to allow compiler transformations such as removal of
empty loops even when termination cannot be proven.</p>
</blockquote>
<p>So this clause is all about enabling optimizations, and I'm guessing
that it was influenced by the C compiler implementers on the committee.</p>
<p>I presume that they had good reasons for adding this, and that it
makes a signicant difference in the performance of real-world code.
And if you want to write an infinite loop deliberately, you can still
do so because of the "constant expression" exception.</p>
<p>But it means that I can write code whose behavior is well defined
in terms of pre-2011 C, and that can behave differently in C11.
For example:</p>
<pre><code>const int keep_going = 1;
while (keep_going) {
;
}
puts("This should never appear");
</code></pre>
<p>In C90 and C99, the message "This should never appear" will never be
printed. In C11, because <code>keep_going</code> is not a constant expression,
the compiler can legally <em>assume</em> that the loop terminates, and the
message may or may not be printed.</p>
<p>I'd be interested in seeing cases where this additional permission
is actually helpful.</p>
<p>Furthermore, I find the way this permission is worded to be clumsy.
It's a statement about what the implementation is permitted to
<em>assume</em>. What really matters is what the implementation is permitted
to <em>do</em>. A better and more consistent way of expressing this, I think,
would have been something like:</p>
<blockquote>
<p>If an iteration statement whose controlling expression is not a
constant expression, that performs no input/output operations,
does not access volatile objects, and performs no synchronization
or atomic operations in its body, controlling expression, or
(in the case of a <strong><code>for</code></strong> statement) its <em>expression-3</em> does
not explicitly terminate, it is unspecified whether it terminates
or not.</p>
</blockquote>
<p>Or it could say that if such a loop does not terminate, the behavior
is undefined -- but that would give compilers much more latitude than
the current wording.</p>
<h3><code>fgetc()</code> when <code>sizeof (int) == 1</code></h3>
<h5>ISO C 7.21.7.1 The <code>fgetc</code> function</h5>
<p>The standard makes some implicit assumptions about how character
input works. If <code>sizeof (int) == 1</code> (which requires <code>CHAR_BIT >= 16</code>),
<code>EOF</code> isn't distinct from any valid <code>char</code> value. I think there are
also some assumptions about how unsigned-to-signed conversion works;
the result is implementation-defined, but some possible implementation
definitions would break stdio character input. I need to study
this further.</p>
<h3>More stuff ...</h3>
<p>... as I think of it.</p>
<p><em>Last updated Wed Mar 19 14:42:43 2014 -0700</em></p>
Keith Thompsonhttp://www.blogger.com/profile/06676710315024892313noreply@blogger.com2tag:blogger.com,1999:blog-6707005381562501436.post-4624981073182503702013-12-28T15:03:00.002-08:002014-07-30T18:03:12.386-07:00Where should the control key be?<!-- Title: Where should the control key be? -->
<!-- URL: http://the-flat-trantor-society.blogspot.com/2013/12/where-should-control-key-be.html -->
<p>Almost all modern computer keyboards place the Caps Lock key
immediately to the left of <strong>A</strong>, with the Shift key below it (next
to <strong>Z</strong>) and the Control key below that, in the lower left corner.</p>
<p>It wasn't always this way.</p>
<p>For example, many of Sun's keyboards (<a href="http://xahlee.info/kbd/sun_microsystems_keyboard.html">images
here</a>) put the
Control key immediately to the left of <strong>A</strong>, and the Caps Lock key
in the lower left corner.</p>
<p>If you happen to like the "modern" layout, that's great; I'm not going
to try to change your mind, and you can feel free to stop reading now.</p>
<p>But personally, I find it <em>much</em> easier to type when the Control
key is immediately to the left of the <strong>A</strong> key, and the Caps Lock
(which I hardly ever use) is either safely out of easy reach or
disabled altogether. I use control sequences extensively. I'm a
heavy user of vim, I occasionally use Emacs, and I use Emacs-style
key bindings in the bash shell. Reaching my left pinky finger down
below the shift key every few seconds is quite awkward, but if the
control key is on the home row I don't even have to think about it.
Yes, I've tried using keyboards with Control below Shift; no, I've
never been able to get used to it.</p>
<p>Fortunately, there are ways to remap your keyboard in software so that
the key labeled "Caps Lock" acts as a Control key. Unfortunately,
those ways vary considerably from one operating system to another.</p>
<a name='more'></a>
<ul>
<li><p><strong>Microsoft Windows</strong>:</p>
<p>Microsoft Windows does let you do some limited keyboard remapping
through the Control Panel (in Windows 7 at least, it's under "Region
and Language", not under "Keyboard") -- but for some unfathomable
reason there's no option to remap the Caps Lock and Control keys.</p>
<p>You can swap the Control and Caps Lock keys, or make
Caps Lock an additional Control key, by modifying the
system registry. I provide instructions for doing so
<a href="https://github.com/Keith-S-Thompson/no-caps-lock">here</a>.
Unfortunately, this is a system-wide setting; it doesn't let you
change the layout for an individual user. I advise <em>not</em> applying
this registry patch to a shared Windows system unless you're sure that
all users of the system are ok with a "non-standard" keyboard layout.</p></li>
<li><p><strong>Linux</strong> (or GNU/Linux if you prefer):</p>
<p>Fortunately, Linux-based systems generally <em>do</em> let you modify
keyboard layouts on a per-user basis. The specific method can vary
depending on which distribution and desktop environment you use.
One of the following methods is likely to work.</p>
<p>See also <a href="http://unix.stackexchange.com/questions/114022/map-caps-lock-to-control-in-linux-mint">this question</a>
and <a href="http://unix.stackexchange.com/questions/114022/map-caps-lock-to-control-in-linux-mint/114023#114023">this answer</a>
on <a href="http://unix.stackexchange.com">unix.stackexchange.com</a>.</p>
<p>**UNIX-like command-line solutions:</p>
<p>Either of the following commands should work to map Caps Lock to
Control (making both keys act like a Control key) for the duration
of the current X session:</p>
<p>xmodmap -e 'clear Lock' \
-e 'keycode 0x42 = Control<em>L' \
-e 'add Control = Control</em>L' </p>
<p>or:</p>
<p>setxkbmap -option ctrl:nocaps</p>
<p>I think the <code>setxkbmap</code> command is newer; you might have to resort to
<code>xmodmap</code> for some older systems.</p>
<p>I think that</p>
<p>setxkbmap -option ctrl:swapcaps</p>
<p>will swap the Control and Caps Lock keys, but I haven't tried it..</p>
<p>Both of these have the drawback that the behavior will revert to
the default when the current X session terminates (typically when
you log out or reboot). You can either re-execute the command on
startup, or arrange for the system to do it for you.</p>
<p>I find it more convenient, where possible, to do this through the
desktop GUI, so the setting is persistent across reboots.</p>
<p><strong>Debian 6, Gnome desktop</strong>:</p>
<ul>
<li>"System" > "Preferences" > "Keyboard"</li>
<li>Select the "Layouts" tab</li>
<li>Highlight the layout you use (mine is "USA")</li>
<li>Click the "Options" button</li>
<li>Under "Ctrl key position", select "Make CapsLock an additional
Ctrl", or whichever option you prefer.</li>
</ul>
<p><strong>Linux Mint 14, Cinnamon desktop</strong>:</p>
<ul>
<li>From the "System Tools" menu, select "System Settings", then
open "Keyboard Layout"</li>
<li>Select the "Layouts" tab</li>
<li>Click the "Options..." button.</li>
<li>Open "Caps Lock key behavior" and select the
option you prefer. I use "Make Caps Lock an additional Control but
keep the Caps_Lock keysym", which makes both Caps Lock and Control
act as a Control key.</li>
</ul>
<p><strong>Linux Mint 15, Cinnamon destkop</strong>:</p>
<ul>
<li>From the "System Tools" menu, select "System Settings", then
open "Regional Settings"</li>
<li>Select the "Layouts" tab</li>
<li>Click the "Options..." button.</li>
<li>Open "Caps Lock key behavior" and select the
option you prefer. I use "Make Caps Lock an additional Control but
keep the Caps_Lock keysym", which makes both Caps Lock and Control
act as a Control key.</li>
</ul>
<p><strong>Linux Mint 16, KDE desktop</strong>:</p>
<ul>
<li>From the main menu, select "Applications", then "Settings", then "System Settings".</li>
<li>Under "Hardware", open "Input Devices"</li>
<li>Keyboard settings are shown by default; open the "Advanced" tab.</li>
<li>Click the "Control keyboard options" checkbox.</li>
<li>Open "Ctrl Key Position"</li>
<li>Enable and select "Caps lock as Ctrl" or "Swap Ctrl and Caps Lock"</li>
</ul>
<p><strong>Linux Mint 17, Xfce desktop</strong>:
Oddly, the Xfce settings GUI doesn't seem to have an option to change
the behavior of the Caps Lock key. See "<strong>UNIX-like command-line
solutions</strong>" above.</p>
<p>Modifying <code>/etc/default/keyboard</code> will affect all users on the system.</p>
<p><strong>Linux virtual console</strong>:
<a href="http://www.emacswiki.org/emacs/MovingTheCtrlKey#toc7">This web page</a>
discusses various ways to remap the control key in the Linux
virtual console. (This is the text-only console reachable by typing
Ctrl-Alt-F1, Ctrl-Alt-F2, etc.). The most straightforward method
seems to be:</p>
<ul>
<li>Add the line <code>XKBOPTIONS="ctrl:nocaps"</code> to <code>/etc/default/keyboard</code></li>
<li><code>$ sudo dpkg-reconfigure -phigh console-setup</code></li>
</ul>
<p>Replace <code>nocaps</code> by <code>swapcaps</code> if you prefer to swap Control and
Caps-Lock rather than making both keys act like Control keys.</p>
<p>I've tried this on Debian 6, and it works after a reboot.</p></li>
<li><p><strong>Mac OS X 10.5.8</strong>:</p>
<ul>
<li>System Preferences</li>
<li>Keyboard & Mouse</li>
<li>Keyboard tab > Modifier Keys ...</li>
<li>Change Caps Lock to act as Control</li>
<li>Optional: Change Control to act as Caps Lock</li>
</ul></li>
</ul>
<p><em>Last updated 2014-07-30 17:46:23 -0700</em></p>Keith Thompsonhttp://www.blogger.com/profile/06676710315024892313noreply@blogger.com2tag:blogger.com,1999:blog-6707005381562501436.post-59635805116499031012012-11-05T16:29:00.002-08:002014-02-09T15:23:26.558-08:00Markdown<!-- Title: Markdown -->
<!-- URL: http://the-flat-trantor-society.blogspot.com/2012/11/markdown.html -->
<p>I've decided to start composing and maintaining this blog using
<a href="http://daringfireball.net/projects/markdown/">Markdown</a>.</p>
<p>If you're not familiar with it (or even if you are), Markdown is a
text-to-HTML conversion tool for web writers. Raw Markdown is much
more readable and easier to work with than raw HTML. It doesn't
<em>directly</em> provide the full power of HTML, though you can include raw
HTML in a Markdown document -- and you can do <em>italics</em>, <strong>bold</strong>,
and <strong><em>bold italics</em></strong> directly in Markdown.</p>
<p>It's used (in slightly different flavors) on <a href="https://github.com/">GitHub</a>
and on the <a href="http://stackexchange.com/">StackExchange</a> network of sites,
among other places.</p>
<p>All posts on this blog are maintained as <a href="https://github.com/Keith-S-Thompson/the-flat-trantor-society">a GitHub
project</a>.
If you're sufficiently curious, you can see the Markdown form of all
the articles, and how I've revised them over time.</p>
<p>One thing I've noticed with the composition software used by
blogspot.com is that switching between the "HTML" and "Compose"
views <em>changes the HTML</em>; in particular, it removes <code><p></code> paragraph
markup, replacing it by <code><br /></code> line breaks. Because of this I need
to copy the Markdown-generated HTML into the HTML window and click
the <strong>Update</strong> button <em>without</em> looking at the preview. Annoying,
but not fatal.</p>
<p>Markdown is converted to HTML by the
<a href="http://daringfireball.net/projects/downloads/Markdown_1.0.1.zip"><code>markdown</code></a>
command, which is also available as a .deb package on Debian and
Debian-derived systems such as Ubuntu and Linux Mint:</p>
<pre><code>sudo apt-get install markdown
</code></pre>
<p>It should be available for other systems as well. I run a simple
<code>gen-html</code> script (included in the GitHub project for this blog), and
then manually copy-and-paste the generated HTML into blogspot.com's
web interface. The manual step is annoying, but overall it should
make it easier to write and maintain this blog.</p>
<p><a href="http://johnmacfarlane.net/pandoc/">Pandoc</a> is another good conversion
tool, handles numerous other formats as well. It should be available
for most systems.</p>
<p>Who knows, I might even get around to posting more articles!</p>
<p><em>Last updated Sat 2013-12-28 16:29:29 PST</em></p>
Keith Thompsonhttp://www.blogger.com/profile/06676710315024892313noreply@blogger.com0tag:blogger.com,1999:blog-6707005381562501436.post-37219936263505085002012-03-05T16:35:00.000-08:002014-02-17T08:34:37.042-08:00No, strncpy() is not a "safer" strcpy()<!-- Title: No, strncpy() is not a "safer" strcpy() -->
<!-- URL: http://the-flat-trantor-society.blogspot.com/2012/03/no-strncpy-is-not-safer-strcpy.html -->
<p>The C standard library declares a number of string functions in the
standard header <code><string.h></code>.</p>
<p>By the standards of some other languages, C's string handling is
fairly primitive. Strings are simply arrays of characters terminated
by a null character <code>'\0'</code>, and are manipulated via <code>char*</code> pointers.
C has no string type. Instead, a "string" is a data <em>layout</em>, not a data <em>type</em>.
Quoting the ISO C standard:</p>
<blockquote>
<p>A <em>string</em> is a contiguous sequence of characters terminated by
and including the first null character.</p>
</blockquote>
<a name='more'></a>
<p>So what happens if you call a C string function with a pointer into
a char array that isn't properly terminated by a null character?
Such an array does not contain a "string" in the sense that C
defines the term, and the behavior of most of C's string functions
on such arrays is <em>undefined</em>. That doesn't mean the function will
fail cleanly, or even that your program will crash; it means that
as far as the standard is concerned, literally anything can happen.
In practice, what typically happens is that the function will keep
looking for that terminating null character either until it finds it
in some chunk of memory it really shouldn't be looking at, or until
it crashes because it looked in some chunk of memory that it really
shouldn't be looking at.</p>
<p>To <em>partially</em> address this, C provides "safer" versions of some
string functions, versions that let you specify the maximum size of
an array. For example, the strcmp() function compares two strings,
but can fail badly if either of the arguments points to something that
isn't a string. The strncmp() function is a bit safer; it requires
a third argument that specifies the maximum number of characters to
examine in each array:</p>
<ul>
<li><code>int strcmp (const char *s1, const char *s2);</code></li>
<li><code>int strncmp(const char *s1, const char *s2, size_t n);</code></li>
</ul>
<p>Which brings us (finally!) to the topic of this article: the
<code>strncpy()</code> function.</p>
<p><code>strcpy()</code> is a fairly straightforward string function. Given two
pointers, it copies the string pointed to by the second pointer into
the array pointed to by first. (The order of the arguments mimics
the order of the operands in an assignment statement.) It's up to
the caller to ensure that there's enough room in the target array to
hold the copied contents.</p>
<p>So you'd <em>think</em> that <code>strncpy()</code> would be a "safer" version of
<code>strcpy()</code>. And given their respective declarations, that's exactly
what it looks like:</p>
<ul>
<li><code>char *strcpy (char *dest, const char *src);</code></li>
<li><code>char *strncpy(char *dest, const char *src, size_t n);</code></li>
</ul>
<p>But no, that's not what the <code>strncpy()</code> function does at all.</p>
<p>Here's the description of <code>strcpy()</code> from the <a href="http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf">latest draft of the
C standard</a>:</p>
<blockquote>
<p>The <strong>strcpy</strong> function copies the string pointed to by <strong>s2</strong> (including the
terminating null character) into the array pointed to by <strong>s1</strong>. If copying
takes place between objects that overlap, the behavior is undefined.</p>
</blockquote>
<p>And here's the corresponding description of <code>strncpy()</code>:</p>
<blockquote>
<p>The <strong>strncpy</strong> function copies not more than <strong>n</strong> characters
(characters that follow a null character are not copied) from the array
pointed to by <strong>s2</strong> to the array pointed to by <strong>s1</strong>. If copying
takes place between objects that overlap, the behavior is undefined.</p>
</blockquote>
<p>So far, so good, right? Almost -- but there's more:</p>
<blockquote>
<p>If the array pointed to by <strong>s2</strong> is a string that is shorter than
<strong>n</strong> characters, null characters are appended to the copy in the array
pointed to by <strong>s1</strong>, until <strong>n</strong> characters in all have been written.</p>
</blockquote>
<p>That second paragraph means that if the string pointed to by <code>s2</code> is
shorter than <code>n</code> characters, it doesn't just copy <code>n</code> characters and add a
terminating null character, which is what you'd expect. It adds null
characters until it's copied a total of <code>n</code> characters. If the source
string is 5 characters long, and the target is a 1024-byte buffer,
and you set n to the size of the target, <code>strncpy</code> will copy those
5 characters and then fill all 1019 remaining bytes in the target
with null characters. Since all it takes to terminate a string is
a single null character, this is almost always a waste of time.</p>
<p>Ok, so that's not so bad. CPUs are fast these days, and filling a
buffer with zeros is not an expensive operation, right? Unless you're
doing it a few billion times, but let's not worry about premature
optimization.</p>
<p>The trap is in that first paragraph. If the target buffer is 5
characters long, you'd quite reasonably set <code>n</code> to 5. But if the
source string is longer than 5 characters, then you'll end up without
a terminating null character in the target array. In other words,
the target array <em>won't contain a string</em>. Try to treat it as if it
does (say, by calling <code>strlen()</code> on it or passing it to <code>printf())</code>,
and Bad Things Can Happen.</p>
<p>The description of the <code>strcpy()</code> and <code>strncpy()</code> functions is
identical in the 1990, 1999, and 2011 versions of the ISO C standard --
except that C99 and C11 add a footnote to the <code>strncpy()</code> description:</p>
<blockquote>
<p>Thus, if there is no null character in the first <strong>n</strong> characters
of the array pointed to by <strong>s2</strong>, the result will not be
null-terminated.</p>
</blockquote>
<p>The bottom line is this: in spite of its frankly misleading name,
<code>strncpy()</code> isn't really a string function.</p>
<p>[<strong>TODO</strong>: Discuss <code>dest[0]='\0'; strncat(dest, src, size);</code> as a
better-behaved alternative, something that does what most people assume
<code>strncpy()</code> does.]</p>
<p>Now having a function like this in the standard library isn't such
a bad thing in itself. It's designed to deal with a specialized
data structure, a fixed-size character array of <strong>N</strong> characters
that can contain up to <strong>N</strong> characters of actual data, with the
rest of the array (if any) padded with 0 or more null characters.
Early Unix systems used such a structure to hold file names in
directories, for example (though it's not clear that <code>strncpy()</code>
was invented for that specific purpose).</p>
<p>The problem is that the name <code>strncpy()</code> strongly implies that it's a
"safer" version of <code>strcpy()</code>. It isn't.</p>
<p>Most of the other <code>strn*()</code> functions are safer versions of their
unbounded counterparts: <code>strcat()</code> vs. <code>strncat()</code> and <code>strcmp()</code>
vs <code>strcmp()</code>. [<strong>TODO</strong>: Discuss the bounds-checking versions added
in Annex K of the 2011 ISO C standard).</p>
<p>It's because <code>strncpy()</code>'s name implies something that it isn't that
it's such a trap for the unwary. It's not a useless function, but I
see far more incorrect uses of it than correct uses. This article
is my modest attempt to spread the word that <code>strncpy()</code> isn't what
you probably think it is.</p>
<p>I've put together a
<a href="https://github.com/Keith-S-Thompson/strncpy_demo">small demo</a>
as a GitHub project.</p>
<p><em>Last updated Mon Feb 17 08:33:27 2014 -0800</em></p>
Keith Thompsonhttp://www.blogger.com/profile/06676710315024892313noreply@blogger.com11tag:blogger.com,1999:blog-6707005381562501436.post-15696545292346536652012-01-13T21:01:00.000-08:002014-02-09T15:23:06.899-08:00First post<!-- Title: First post -->
<!-- URL: http://the-flat-trantor-society.blogspot.com/2012/01/first-post.html -->
<p>Greetings to my vast army of followers.</p>
<p>This is my new blog, in which I will sporadically post rants and
musings on software development, programming language standards,
and whatever else strikes my fancy at the moment.</p>
<p>Welcome.</p>
<p><em>Last updated Mon 2012-11-05 16:48:00 PST</em></p>
Keith Thompsonhttp://www.blogger.com/profile/06676710315024892313noreply@blogger.com0