Monday, March 5, 2012

No, strncpy() is not a "safer" strcpy()

The C standard library declares a number of string functions in the standard header <string.h>.

By the standards of some other languages, C's string handling is fairly primitive. Strings are simply arrays of characters terminated by a null character '\0', and are manipulated via char* pointers. C has no string type. Instead, a "string" is a data layout, not a data type. Quoting the ISO C standard:

A string is a contiguous sequence of characters terminated by and including the first null character.

So what happens if you call a C string function with a pointer into a char array that isn't properly terminated by a null character? Such an array does not contain a "string" in the sense that C defines the term, and the behavior of most of C's string functions on such arrays is undefined. That doesn't mean the function will fail cleanly, or even that your program will crash; it means that as far as the standard is concerned, literally anything can happen. In practice, what typically happens is that the function will keep looking for that terminating null character either until it finds it in some chunk of memory it really shouldn't be looking at, or until it crashes because it looked in some chunk of memory that it really shouldn't be looking at.

To partially address this, C provides "safer" versions of some string functions, versions that let you specify the maximum size of an array. For example, the strcmp() function compares two strings, but can fail badly if either of the arguments points to something that isn't a string. The strncmp() function is a bit safer; it requires a third argument that specifies the maximum number of characters to examine in each array:

  • int strcmp (const char *s1, const char *s2);
  • int strncmp(const char *s1, const char *s2, size_t n);

Which brings us (finally!) to the topic of this article: the strncpy() function.

strcpy() is a fairly straightforward string function. Given two pointers, it copies the string pointed to by the second pointer into the array pointed to by first. (The order of the arguments mimics the order of the operands in an assignment statement.) It's up to the caller to ensure that there's enough room in the target array to hold the copied contents.

So you'd think that strncpy() would be a "safer" version of strcpy(). And given their respective declarations, that's exactly what it looks like:

  • char *strcpy (char *dest, const char *src);
  • char *strncpy(char *dest, const char *src, size_t n);

But no, that's not what the strncpy() function does at all.

Here's the description of strcpy() from the latest draft of the C standard:

The strcpy function copies the string pointed to by s2 (including the terminating null character) into the array pointed to by s1. If copying takes place between objects that overlap, the behavior is undefined.

And here's the corresponding description of strncpy():

The strncpy function copies not more than n characters (characters that follow a null character are not copied) from the array pointed to by s2 to the array pointed to by s1. If copying takes place between objects that overlap, the behavior is undefined.

So far, so good, right? Almost -- but there's more:

If the array pointed to by s2 is a string that is shorter than n characters, null characters are appended to the copy in the array pointed to by s1, until n characters in all have been written.

That second paragraph means that if the string pointed to by s2 is shorter than n characters, it doesn't just copy n characters and add a terminating null character, which is what you'd expect. It adds null characters until it's copied a total of n characters. If the source string is 5 characters long, and the target is a 1024-byte buffer, and you set n to the size of the target, strncpy will copy those 5 characters and then fill all 1019 remaining bytes in the target with null characters. Since all it takes to terminate a string is a single null character, this is almost always a waste of time.

Ok, so that's not so bad. CPUs are fast these days, and filling a buffer with zeros is not an expensive operation, right? Unless you're doing it a few billion times, but let's not worry about premature optimization.

The trap is in that first paragraph. If the target buffer is 5 characters long, you'd quite reasonably set n to 5. But if the source string is longer than 5 characters, then you'll end up without a terminating null character in the target array. In other words, the target array won't contain a string. Try to treat it as if it does (say, by calling strlen() on it or passing it to printf()), and Bad Things Can Happen.

The description of the strcpy() and strncpy() functions is identical in the 1990, 1999, and 2011 versions of the ISO C standard -- except that C99 and C11 add a footnote to the strncpy() description:

Thus, if there is no null character in the first n characters of the array pointed to by s2, the result will not be null-terminated.

The bottom line is this: in spite of its frankly misleading name, strncpy() isn't really a string function.

[TODO: Discuss dest[0]='\0'; strncat(dest, src, size); as a better-behaved alternative, something that does what most people assume strncpy() does.]

Now having a function like this in the standard library isn't such a bad thing in itself. It's designed to deal with a specialized data structure, a fixed-size character array of N characters that can contain up to N characters of actual data, with the rest of the array (if any) padded with 0 or more null characters. Early Unix systems used such a structure to hold file names in directories, for example (though it's not clear that strncpy() was invented for that specific purpose).

The problem is that the name strncpy() strongly implies that it's a "safer" version of strcpy(). It isn't.

Most of the other strn*() functions are safer versions of their unbounded counterparts: strcat() vs. strncat() and strcmp() vs strcmp(). [TODO: Discuss the bounds-checking versions added in Annex K of the 2011 ISO C standard).

It's because strncpy()'s name implies something that it isn't that it's such a trap for the unwary. It's not a useless function, but I see far more incorrect uses of it than correct uses. This article is my modest attempt to spread the word that strncpy() isn't what you probably think it is.

I've put together a small demo as a GitHub project.

Last updated Mon Feb 17 08:33:27 2014 -0800

11 comments:

  1. This comment has been removed by the author.

    ReplyDelete
  2. Very Informative. Thanks for this good article.

    ReplyDelete
  3. "If the target buffer is 5 characters long, you'd quite reasonably set n to 5"

    I'd say that if the target buffer is 5 char long, you should set n to 4.

    char target[5];
    ...
    strncpy(target, source, sizeof(target)-1);

    ReplyDelete
  4. This is actually the rationale for OpenBSD's strlcpy()

    ReplyDelete
  5. Great post, very informative.... thanks. bookmarked, Also a good question for interview

    ReplyDelete
  6. Thank you, this post is really illuminated me.

    ReplyDelete
  7. I'm not sure that's a "problem" as you kinda point it down. I mean, for me it's exactly the same thing you have to think about when you're using it, but no more difficult, or different, than a simple malloc. When you want to allocate memory to a "string" with malloc, you're gonna do something like this :
    str = malloc(sizeof(char) * (strlen(str) + 1))
    That "+ 1" is exactly the same that you'd use for strncpy (strncpy(s1, s2, strlen(s2) + 1) and it's something every C programmer do every day. It's not about if something is "safe" or not, it's about usefullness, and i'm pretty sure the strncpy was designed like that for a point (copying some parts of headers for example, i'm sure). The fact is, as you point out at the start of your argumentation, C does not have strings. It has arrays, and they have to be treated as such. '\0' at the end or not (and yeah, it can be usefull, or simply intended, not to have '\0' at the end of your string. You just have to be extra carefull for what you do with it ^^).

    ReplyDelete
    Replies
    1. But then, strncpy does exactly the same thing as strcpy, so what's the point?

      Delete
    2. I never said that C doesn't have strings. I said that C doesn't have a string *type*. Something that doesn't have a '\0' character at the end of it is, by definition not a string.

      Delete
  8. On principle I always place a '\0' at the end of the array after every call to strncpy() It is the only way to guarantee the copied array is still a valid string.

    ReplyDelete
  9. I always felt like something was fishy with `strncpy` and the ugly kludges I had to do to make sure that the result was a string. Thanks for setting me straight. What do you think of `stpecpy`, as seen in https://man7.org/linux/man-pages/man7/string_copying.7.html?

    ReplyDelete