Should I use Signed or Unsigned Ints In C? (Part 2)
2015-08-16 - By Robert Elder
Introduction
In the first part of this discussion I compared the characteristics of using signed versus unsigned integers, and asked the question 'If you were stranded on a desert island with only signed or unsigned integers, which one would you pick?'. I concluded that I would pick unsigned integers, although there are cases where you will have to use signed integers. For practical programming you cannot pick only one of these choices. In this follow up, I will review more differences (gotchas) between signed and unsigned ints, and attempt to summarize the situations where one may be preferable. Many people disagreed with my conclusion that unsigned integers are better, and I will provide more justification to my conclusion that unsigned integers are preferable in a world where you could use one. Also note that this article considers only the C (and to a lesser extent the C++) programming language.
Where did this question come from?
In my case, I was designing a spec for a CPU that was intended to be as minimal as possible. The CPU would physically implement signed or unsigned atomic math operations, not both. The compiler would support the missing one (signed or unsigned) using emulation in terms of the one that was physically implemented. I started out by choosing signed int since it naturally handles negatives, but I regretted my decision when I learned about all the undefined (and implementation defined) behaviour of overflow and specifically of shifting. The lack of guarantee of two's complement representation was also on my mind. It is an interesting exercise for the reader to attempt to implement unsigned math with only signed int containers. Then try emulating signed with unsigned int containers. It is far easier to do the second case, since you can be much more confident that your solution doesn't rely on undefined behaviour. Try it out for yourself, and tell me which one you like best :)
When To Use Signed Vs. Unsigned
Most people are not engaging in the same exercise in minimalism that I am, so I have attempted to collect a summary of opinions that should give you a good idea of when to use signed or unsigned ints:
Prefer Signed | Prefer Unsigned |
---|---|
If you have never heard of, or don't under stand C99's section 6.3.1.8 Usual arithmetic conversions. If you have never heard of, or don't under stand C99's 'integer promotion' rules. See 'Mixing Signed and Unsigned' below. | When using bitwise operators (>> << | & ^ ~). This is reflected in Rule 12.7 of MISRA C 'Bitwise operators shall not be applied to operands whose underlying type is signed.' The MISRA C standard is a standard that aims to produce very reliable software in embedded systems such as aerospace or medical devices. |
Having undefined behaviour in your program is not a big deal, or you've been very careful to prove that any signed arithmetic you do will never overflow. | Undefined behaviour scares you and you want to be able to demonstrate at compile time that arithmetic overflow results in well defined behaviour. |
The interface to the standard library requires that you use signed integers: int putchar(int); | If you need to represent sizes that are on the order of the largest memory objects ex: Count the number of bytes available in a 32 bit address space. |
If you think that the value of i is -1 in the result of this assignment:
(The value becomes UINT_MAX, not -1)
|
When modular arithmetic is not a surprise or is desirable. The SHA-1 hash algorithm is an example of a situation where math modulo 2^32 is desired. |
ptrdiff_t is the type returned when subtracting two pointers. It is defined in the C99 standard section 7.17 Common definitions as a signed integer type. Therefore, if you subtract pointers, you're using a signed type whether you like it or not. | size_t is the type returned by the sizeof operator. It is defined in the C99 standard section 7.17 to be an unsigned integer type. It is claimed by some that it was a mistake to standardize size_t as unsigned. |
On the topic of signed versus unsigned, Bjarne Stroustrup (the creator of C++) says: "Use [signed] int until you have a reason not to. Use unsigned if you are fiddling with bit patterns, and never mix signed and unsigned." I provide more context on his statement below, and an example of where you can get into trouble mixing signed and unsigned. Source: (This talk around 12:56) | |
If you think that this loop will terminate: (it does not)
|
|
Scott Meyers recommends not using unsigned types in interfaces. I talk about this below. |
On Preference For Signed Interfaces
In Scott Meyers' article he recommends not using unsigned types in interfaces. His argument is that if an unsuspecting user attempts to call an interface using a negative signed number by mistake, this negative value will be converted to a huge positive value by following the well-defined conversion rules from signed to unsigned. I think his argument does put forward a good case for using signed integers in interfaces. It is worth noting that you still get problems if you use signed ints in interfaces, and call them using unsigned integers that are not representable in an int. In this case, you would get undefined behaviour. In general you can't control what your users will call your API with, and the conservative estimate would be to expect that they aren't very good programmers, and don't enable any warnings (or ignore them). You then have to choose between an API user experiencing relatively frequent bugs that have well defined behaviour (using unsigned ints) and rare bugs that have undefined behaviour (using signed ints). If I were using the API, I would prefer the former, most people would prefer the latter.
Why Do I Still Think Unsigned Is Better?
The central thought in my argument is this:
Well-defined, standard compliant, wrong code is wrong 24/7 365 days a year and it will always produce exactly the same wrong answer. In this case the programmer is 100% in control of the fact that the produced answer is wrong. With undefined code, you might get the correct answer today, but the wrong answer tomorrow and the programmer has no control over this. Unit testing every possible program state won't find a bug that will be introduced when the next version of your compiler starts implementing undefined behaviour differently. Since unsigned integers are less prone to undefined behaviour (remember, this article only talks about C programming), I claim that they are more desirable than signed integers, specifically because of the undefined behaviour on signed overflow and signed shift operations. Furthermore, many of the attempts at guarding against invoking the undefined behaviour using signed integers are either computationally expensive, compiler specific, or most importantly, involve run-time checks to exclude undefined behaviour, whereas using unsigned integers allows for compile time checks to exclude (more, but not all) undefined behaviour. Satisfying compile time versus run time constraints on your code is a very big difference.
Run-Time Vs. Compile-Time Constraints
Let's look at some specifics related to the difference between guarantees you get with signed versus unsigned. If you write the following function:
int add_signed(int a, int b){
return a + b;
}
can you write a static analyzer that will tell you at compile time if this program is 'correct' for some definition of correct? Absolutely not. Game over. The reason is that the given function can be invoked with values over the entire input space. Specifically, it can be invoked with a = INT_MAX and b = INT_MAX. This invokes undefined behaviour. At 9:51pm on August 15th, 2015 using the GCC compiler on my machine with default flags and -O0 this gives -2. But what will happen at 9:52pm? Or what about on December 25, 2019 using clang's new -O999 option? Since the behaviour is undefined, a possible result is not getting -2, but corrupting some internal program state. Today this code gives the right answer. Will it give the right answer tomorrow? What if you change the compiler flags?
It's worth noting that as I was reviewing this article, I tested out the example above using the -ftrapv flag (which catches overflows) to make sure that it actually does trigger undefined behaviour. When I do this in clang, I get:
$ clang -ftrapv main.c
$ ./a.out
Illegal instruction (core dumped)
And when I use GCC I get:
$ gcc -ftrapv main.c
$ ./a.out
-2
Despite the fact that both clang and gcc both document ftrapv, it just silently doesn't work in gcc ((Ubuntu 4.8.4-2ubuntu1~14.04) 4.8.4). So much for our safety net!
The possible bugs that undefined behaviour can introduce is (IMHO) one of the most catastrophic types of programming errors because it can be non deterministic (across compiler flags and compilers) and it is very rare. What I find amazing is the fact that many people who disagree with me think that the fact that the bug will be rare is a good thing! Compiler authors will likely support this as long as possible, but the peer pressure of needing things to go faster and faster will likely push them to exploit more and more undefined behaviour to their advantage in the future. Their argument will be: 'After all, who has sympathy for those who don't follow the standard?'.
Now ask the same question: Can you write a static analyzer that will tell you at compile time if this program is 'correct' for some definition of correct?
unsigned int add_unsigned(unsigned int a, unsigned int b){
return a + b;
}
The answer is yes. You just check the types, see they're unsigned int. Undefined behaviour is impossible because the standard implies that addition over unsigned integers is a closed operation. Does modular arithmetic confuse you? Cry me a river!
Infinite Loop with Unsigned
Many of the people who are opposed to using unsigned integers cited this classic example below involving an infinite loop resulting from unsigned underflow:
#include <stdio.h>
void fun(int input_s, unsigned int input_u){
unsigned int size_u = 10u;
unsigned int i_u;
int size_s = 10;
int i_s;
/* Loop #1: I'd much prefer the following code */
for(i_u = size_u -1; i_u >=0; i_u--){
printf("unsigned: %u\n", (input_u + i_u) / 2u);
}
/* Loop #2: over this code */
for(i_s = size_s -1; i_s >=0; i_s--){
printf("signed: %i\n", (input_s + i_s) / 2);
}
}
If we heed what the standard says, both of these loops are probably not going to do what you 'want'. Both have 'errors'. Even though only the second case gives the right output as of today, I still prefer the first one because it is consistently wrong, whereas the second one is almost always right. In the first case, the code is obviously wrong (well, obvious if you see that at size=0 we get i=UINT_MAX). That is fantastic! For size = 0, this code fails today, tomorrow, with gcc, clang, icc, visual studio for every conforming compiler optimization level., and it will as long as the standard is relevant, and as long as they don't change the definition of already defined behaviour.
The second loop does something much more horrible, it just pretends that everything is OK. Silently waiting... For years... (I recieved some emails asking what the problem was with this case: It is signed overflow in the input_s + i_s. This will work perfectly fine with most compilers, but it is actually undefined, and will get trapped if you start catching undefined behaviour.)
Additionally, the problems with the unsigned example can be found by enabling warnings, but they aren't found for the second example.
$ gcc -Wextra main.c
main.c: In function ‘fun’:
main.c:10:2: warning: comparison of unsigned expression >= 0 is always true [-Wtype-limits]
for(i_u = size_u -1; i_u >=0; i_u--){
Unsigned underflow
Another case that was pointed out was one where you can run into problems, is if you attempt to subtract a large unsigned value from a small one:
#include <stdio.h>
int main(void){
unsigned int a = 7u;
unsigned int b = 8u;
unsigned int c = a - b;
printf("%u\n", c); /* My machine gives the value 4294967295 */
return 0;
}
Warnings won't help you in this case, because nothing out of the ordinary is happening here (I know, I know it's not what you learned in high school math, but it has to run on a finite state machine!). Many of the people who advocate storing strictly positive values in signed integers recommend using assertions to ensure that the values remain positive. I think a similar approach can be applied here:
#include <stdio.h>
#include <assert.h>
int main(void){
unsigned int a = 7u;
unsigned int b = 8u;
assert(b <= a); /* a.out: main.c:7: main: Assertion b <= a failed. */
unsigned int c = a - b;
printf("%u\n", c);
return 0;
}
or even better
#include <stdio.h>
int main(void){
unsigned int a = 7u;
unsigned int b = 8u;
if(b > a){
printf("A catastrophic error occurred.\n");
}else{
unsigned int c = a - b;
printf("%u\n", c);
}
return 0;
}
If you still decide to use signed integers for this situation, you'll encounter the same kind problem with underflow, it's just that it happens far away from zero where you don't encounter it as often (and it will be undefined behaviour).
What Bjarne Stroustrup Says
In a live Q/A, Bjarne gives his thoughts on signed and unsigned integers:
"Whenever you mix signed and unsigned numbers you get trouble. The rules are just very surprising, and they turn up in code in strange places that correlate very strongly with bugs. Now, when people use unsigned numbers they usually have a reason. And the reason will be something like 'well, it can't be negative' or 'I need an extra bit'. If you need an extra bit, I am very reluctant to believe that you really need it, and I don't think that's a good reason. When you think you can't have negative numbers, you will have somebody who initializes your unsigned with minus two, and think they get minus two, and things like that.
It is just highly error prone.
I think one of the sad thing about the standard library is that the indices are unsigned whereas array indices are signed and you are sort of doomed to have confusion, and problems with that. There are far too many integer types, there are far too lenient rules for mixing them together, and it's a major bug source, which is why I'm saying stay as simple as you can, use [signed integers] till you really really need something else."
- Bjarne Stroustrup, (This talk around 43:00)
Bjarne recommends using signed integers until you have a reason not to, and although I prefer using unsigned integers, I believe I can say that I don't completely disagree with him. Starting off using signed will help you out a bit if you don't understand all the type promotion and conversion rules that exist for mixing integer types. In my case, I think I can justify my 'reason not to' as my (possibly irrational) desire to preclude undefined behaviour in my programs.
Also, I'll note that Bjarne said that 'array indices are signed', although I wasn't able to find a reference for this. The c++11 standard section 5.2.1 on subscripting says "One of the expressions shall have the type 'pointer to T' and the other shall have unscoped enumeration or integral type.". This seems to imply that array indices can be either signed or unsigned. I am reluctant to call what he said an error though since he did invent c++ and contributes to the very standard that I referenced.
Mixing Signed and Unsigned
In the quotation above by Bjarne, he mentioned that 'Whenever you mix signed and unsigned numbers you get trouble'. The example below is intended to illustrate that you have to be careful when mixing signed and unsigned types, especially when they are of different sizes. The complexity in this example comes from a mix of two different implicit conversion rules in C: 'Usual Arithmetic Conversions', and 'Integer Promotion'. Note that we're doing what looks like the exact same thing with types that have identical signs (but different sizes), and we get 2 different results:
#include <stdio.h>
int main(void){
signed short signed_short = -1;
unsigned short unsigned_short = 0;
signed int signed_int = -1;
unsigned int unsigned_int = 0;
/* Integral promotion rules promote to types 'int' and 'int'; Usual
arithmetic rules don't need to do anything because types are the same.
*/
if(signed_short < unsigned_short){
printf("This gets printed\n");
}else{
printf("Does not get printed.\n");
}
/* Integral promotion rules don't need to do anything because types
are 'int' and 'unsigned int'. Usual arithmetic conversion rules
require that types be both signed or both unsigned an in this case
the signed_int is converted to an unsigned int in a well-defined
way to have the value UINT_MAX, which is greater than 0.
*/
if(signed_int < unsigned_int){
printf("Does not get printed.\n");
}else{
printf("This gets printed\n");
}
return 0;
}
Fortunately, clang's -Wsign-compare will complain about the second case.
Usual Arithmetic Conversions
The usual arithmetic conversion rules are complex, and in general they depend on calculating 'ranks' that are determined by the min and max sizes of the underlying integer types your compiler makes available. The goal of these rules is to attempt to preserve the underlying numerical value as much as possible when converting between types. On many platforms that use 32 bit integers, the rules will simplify to the following form:
Type #1 | Type #2 | Result Type |
---|---|---|
unsigned int | unsigned int | unsigned int |
signed int | unsigned int | unsigned int |
unsigned int | signed int | unsigned int |
signed int | signed int | signed int |
This is just a small taste of the usual arithmetic conversion rules, which have to cover both signed and unsigned types for int, long, long long, long double, float, and double. Smaller types like char and short will get promoted to int or unsigned int before the usual arithmetic conversion rules kick in. The integer promotion rules also appear in other places that don't involve usual arithmetic conversions. An example is when passing a char to a function that takes an int.
Conclusion
I should finish by saying that I would probably prefer signed ints if it wasn't for the problem with undefined behaviour on signed overflow and shifting. For me, the fact that undefined behaviour can sneak into my program in a way that I can't statically analyze at compile time is a non-starter. In addition, it would be great if two's complement representation was guaranteed, but unfortunately the standard needs to consider old machines that I'll likely never encounter in my lifetime.
Hopefully, you learned something if you read through both of these articles. I certainly learned a lot reading through everyone's comments that you can find from part 1, and I strongly encourage you to tear this article appart if you find any mistakes.
If you found this interesting, you might want to also check out my C compiler.
How to Get Fired Using Switch Statements & Statement Expressions
Published 2016-10-27 |
$40.00 CAD |
Should I use Signed or Unsigned Ints In C? (Part 1)
Published 2015-07-27 |
The Jim Roskind C and C++ Grammars
Published 2018-02-15 |
7 Scandalous Weird Old Things About The C Preprocessor
Published 2015-09-20 |
GCC's Signed Overflow Trapping With -ftrapv Silently Doesn't Work
Published 2016-05-25 |
Strange Corners of C
Published 2015-05-25 |
Building A C Compiler Type System - Part 1: The Formidable Declarator
Published 2016-07-07 |
Join My Mailing List Privacy Policy |
Why Bother Subscribing?
|