How to Get Fired Using Switch Statements & Statement Expressions
2016-10-27 - By Robert Elder
Updated Oct 27, 2016: Fixed sample code comments in coroutine example as per suggestion, edited text to note D's similar behaviour.
Updated Oct 28, 2016: Added missing colon., corrected Duff's device example.
Introduction
It doesn't matter whether you're trying to achieve job security; impressing others by showing them how smart you are; or passive aggressively asserting your dominance over a code base, writing unmaintainable code has a number of practical applications. One extremely unmaintainable and bug-ridden technique for C programming involves using both switch statements and statement expressions together.
In this article, we will discuss how you can leverage switch statements and statement expressions to produce C code that is so difficult to understand, you'll need to look at the assembly to figure out what it does. Many of the examples of syntax in this article are not standards compliant or they won't pass even the simplest of static analysis tests. That should be ok though, because writing many of these examples in your company's code base will probably end up getting you fired anyway.
Switch Statements
Let's start by reviewing the humble switch statement that we all know and love:
int i = ...;
switch(i){
case 0:{
...
break;
}case 1:{
...
break;
}case 2:{
...
break;
}default:{
...
}
}
The above is what most people are used to thinking about when 'switch statements' are mentioned in C. The rough idea is that switch statements are a sort of more appealing alternative to using lots of 'else if' statements when checking for some disjoint property. Some of you may be surprised to learn that the following is also a valid switch statement:
int i = ...;
switch(i){
i++;
default:{ }
i++;
case 0:{
case 3: i;
}
if(i < 10){
case 1:{
break;
}
for(i=0; i < 10; i++){
case 2:;
}
}
}
It's worth pointing out that almost no other languages support switch statements the way that they work in C (although the D language is an example). Most other languages have a switch statement that works similar to the idea of a more appealing alternative to many 'else if' checks.
How Do Switch Statements In C Actually Work?
A switch statement in C would be more appropriately called a 'goto field'. This means that the switch(...) part simply makes a decision about which label to branch to. After branching to that label nothing special happens related to the fact that you're inside a switch statement and the code will just keep executing whatever machine instructions come next. The one exception is, of course, the break statement which will jump to the point after the switch statement body. Here is an equivalent version of the switch statement written above using only ifs and gotos.
int i = ...;
if(i == 0)
goto label_0;
if(i == 1)
goto label_1;
if(i == 2)
goto label_2;
if(i == 3)
goto label_3;
/* Otherwise, go to default label */
goto label_default;
{
i++;
label_default:{ }
i++;
label_0:{
label_3: i;
}
if(i < 10){
label_1:{
goto break_from_switch;
}
for(i=0; i < 10; i++){
label_2:;
}
}
}
break_from_switch:
If you're already familiar with the famous Duff's device the above this probably isn't news to you:
/* The switch statement used in Duff's device */
int total_bytes = ...;
int n = (total_bytes + 3) / 4;
switch (total_bytes % 4) {
case 0: do { *to = *from++;
case 3: *to = *from++;
case 2: *to = *from++;
case 1: *to = *from++;
} while (--n > 0);
}
Co-Routines Using Switch Statements
Drawing on Duff's device as inspiration you can use the unique behaviour of switch statements in C to implement coroutines:
#include <stdio.h>
#define coroutine_begin() static int state=0; switch(state) { case 0:
#define coroutine_return(x) { state=__LINE__; return x; case __LINE__:; }
#define coroutine_finish() }
int get_next(void) {
static int i = 0;
coroutine_begin();
while (1){
coroutine_return(++i);
coroutine_return(100);
}
coroutine_finish();
}
int main(void){
printf("i is %d\n", get_next()); /* Prints 'i is 1' */
printf("i is %d\n", get_next()); /* Prints 'i is 100' */
printf("i is %d\n", get_next()); /* Prints 'i is 2' */
printf("i is %d\n", get_next()); /* Prints 'i is 100' */
return 0;
}
The example in this section draws on one found in Coroutines in C by Simon Tatham. The original source describes a few caveats of switch based coroutines that I won't discuss in this article.
If we resolve the macros, and indent the code a bit better, then remove some superfluous brackets and semicolons, the 'get_next' function becomes:
#include <stdio.h>
/* Assume the value of __LINE__ was 1234, and 4567 */
int get_next(void) {
static int i = 0;
static int state = 0;
switch(state) {
case 0:;
while(1){
state = 1234;
return ++i;
case 1234:;
state = 4567;
return 100;
case 4567:;
}
}
}
The static variables are key here, because they allow us to communicate information between calls to the function 'get_next'. The switch statement effectively just gives us a convenient way to implement the goto statements that would be necessary to either jump to the start of the coroutine (state = 0), or the point just after the coroutine returns (state = 1234, and state = 4567). If you want to create more return/resume points you can simply add more calls to the 'coroutine_return' macro. You must also make sure that these calls appear between 'coroutine_begin' and 'coroutine_end'.
If the syntax of the case statements is throwing you off, just remember that
case 0:;
does same thing as
case 0: {
}
The first represents a 'statement' with no expression, and the second represents a 'compound statement' with no declarations or expressions. Neither of these forms has a break statement, so execution will always just continue doing whatever is after the case label in the same way that a goto would work.
Interesting Valid Uses of Switch Statements
Now that we've seen how switch statements can get pretty weird, let's explore a few interesting examples of valid switch statement syntax that (depending on how much of a 1337 hacker you are) you may have never seen before:
/* It doesn't get any simpler than this */
switch(0);
/* Braces are not necessary because there is only one statement (that never executes). */
switch(0)
i++;
/* Yup, this compiles. */
switch(0)
switch(0)
switch(0)
switch(0)
switch(0)
switch(0);
/* Same idea. */
switch(0)
case 0:;
switch(0)
case 0:
for(i = 0; i < 10; i++)
case 1:
for(j = 0; j < 10; j++)
case 2:
for(k = 0; k < 10; k++);
/* Same idea as the last example, but more braces. */
switch(0){
case 0:{
for(i = 0; i < 10; i++){
case 1:{
while(j){
case 2:{
for(k = 0; k < 10; k++){
}
}
j++;
}
}
}
}case 3:{
/*...*/
}default:{
/*...*/
}
}
/* The compiler expects a 'statement' to appear after 'case :', but
a case statement itself is a 'labeled statement', which is just another
regular ole statement.*/
switch(i)
default: case 0: case 1: case 2: case 3:;
/* I find this example useful when you have some subset of
switch cases that require identical behaviour, but the rest have
their own unique behaviour:
*/
switch(0){
case UNIQUE_CASE_A:{
break;
}case UNIQUE_CASE_B:{
break;
}case SIMILAR_CASE1: case SIMILAR_CASE2: case SIMILAR_CASE3:{
break;
}
}
Statement Expressions
Statement expressions are a GNU extension that is not supported by the C standard, but they are supported by default in gcc and clang. They allow you to embed a compound statement within an expression. The value returned by the last expression is the value returned by the entire statement expression:
/* Regular ole expression */
int i = 0;
/* Fancy new statement expression */
int j = ({int k; k = i + 1; i;});
You might ask "Why would you ever want to do such a thing?" There are a number of different answers and many of them are related to convenience: One use case is concerned with ensuring that expression statement side-effects are only evaluated once in the case of a function macro that may cause the expression to appear multiple times in the function macro body.
You can stick statement expressions pretty much anywhere you could put a variable:
#include <stdio.h>
int get_zero(void){ return 0; }
int main(void){
/* Prints 0-9 */
for(int i = ({get_zero();}); i < 10; i++)
printf("%d\n", i);
return 0;
}
You can do pretty much anything inside a statement expression that you could do in a regular compound statement:
#include <stdio.h>
int main(void){
/* Set i to sum from 0 to 99 */
int i = ({int j = 0; for(int i = 0; i < 100; i++) j+=i; j;});
printf("Sum is %d\n", i);
return 0;
}
Switch Statements & Statement Expressions
Unfortunately, most of these examples will only compile in clang, since gcc dis-allows branching to a label that is inside a statement expression (which is probably for the best).
Let's try to embed a switch statement into a switch statement expression:
/* This compiles just fine in gcc and clang. */
switch(({switch(0);3;}));
Yup, this compiles in gcc and clang! This code doesn't do anything interesting, but it illustrates one approach to writing C code that is difficult to read.
Now let's try building a more useful example that combines switch and statement expressions. Here is an example where you want to conditionally change the bounds of a loop. You could use a variable or a function but you could also use a statement expression with embedded case labels!
#include <stdio.h>
void print_stuff(int type){
int i = 0;
int r = 0;
switch(type){
for(i = 0; i < ({if(0){ case 1:r+=2; case 0:r+=3;}r;}); i++){
printf("i is %d\n", i);
}
}
}
int main(void){
printf("First run\n");
print_stuff(0);
printf("Second run\n");
print_stuff(1);
return 0;
}
The above example gives me the following output:
First run
i is 0
i is 1
i is 2
Second run
i is 0
i is 1
i is 2
i is 3
i is 4
However, if you run this program through valgrind you'll find that this it's doing uninitialized reads:
==16228== Conditional jump or move depends on uninitialised value(s)
==16228== at 0x4005AB: print_stuff (main.c:7)
==16228== by 0x400619: main (main.c:15)
...
==16228== Conditional jump or move depends on uninitialised value(s)
==16228== at 0x4005AB: print_stuff (main.c:7)
==16228== by 0x400637: main (main.c:17)
This is quite impressive because we're able to create uninitialized reads in a program where all variables are initialized, and there is no pointer or array magic! Let's look at a smaller example that has the same problem:
#include <stdio.h>
int main(void){
int i = 0;
switch(i){
for(i = 0; i < ({case 0:; 10;}); i++){
(void)i;
}
}
return 0;
}
And generating the assembly on my machine you get:
pushq %rbp
.Ltmp0:
.cfi_def_cfa_offset 16
.Ltmp1:
.cfi_offset %rbp, -16
movq %rsp, %rbp
.Ltmp2:
.cfi_def_cfa_register %rbp
movl $0, -4(%rbp)
.loc 1 4 13 prologue_end # main.c:4:13
.Ltmp3:
movl $0, -8(%rbp)
.loc 1 5 2 # main.c:5:2
movb $1, %al
testb %al, %al
jne .LBB0_2
jmp .LBB0_6
.LBB0_1: # in Loop: Header=BB0_2 Depth=1
.loc 1 6 14 discriminator 1 # main.c:6:14
.Ltmp4:
movl -8(%rbp), %eax
movl %eax, -16(%rbp) # 4-byte Spill
.LBB0_2: # =>This Inner Loop Header: Depth=1
.loc 1 6 19 is_stmt 0 discriminator 2 # main.c:6:19
movl $10, -12(%rbp)
.loc 1 6 16 discriminator 2 # main.c:6:16
movl -16(%rbp), %eax # 4-byte Reload
cmpl -12(%rbp), %eax
.loc 1 6 3 discriminator 2 # main.c:6:3
jge .LBB0_5
# BB#3: # in Loop: Header=BB0_2 Depth=1
# BB#4: # in Loop: Header=BB0_2 Depth=1
.loc 1 6 37 discriminator 3 # main.c:6:37
movl -8(%rbp), %eax
addl $1, %eax
movl %eax, -8(%rbp)
.loc 1 6 3 discriminator 3 # main.c:6:3
jmp .LBB0_1
.Ltmp5:
.LBB0_5:
.loc 1 9 2 is_stmt 1 # main.c:9:2
jmp .LBB0_6
.Ltmp6:
.LBB0_6:
xorl %eax, %eax
.loc 1 10 9 # main.c:10:9
popq %rbp
retq
.Ltmp7:
.Lfunc_end0:
The uninitialized read occurs because these lines get skipped on the first loop iteration:
.Ltmp4:
movl -8(%rbp), %eax
movl %eax, -16(%rbp) # 4-byte Spill
These instructions make a copy of the 'i' variable and store it at -16(%rbp) for use in the comparison 'i < ({case 0:; 10;})'. In the first iteration the use of the jump from the switch statement jumps over these instructions, although they are executed on later loop iterations. It seems pretty reasonable that the compiler would do this, after all you are telling it to branch into the middle of a for loop comparison.
Here's another almost practical example where we can use a case label to jump into the middle of a function parameter evaluation:
#include <stdio.h>
void f(int type){ }
int main(void){
int i = 0;
switch(i){
f(i + ({case 0:; 1;}) + i);
}
return 0;
}
If you compile this with clang 3.8.0-2ubuntu4 you get the following:
...
clang: error: unable to execute command: Segmentation fault (core dumped)
clang: error: clang frontend command failed due to signal (use -v to see invocation)
clang version 3.8.0-2ubuntu4 (tags/RELEASE_380/final)
Target: x86_64-pc-linux-gnu
Thread model: posix
InstalledDir: /usr/bin
clang: note: diagnostic msg: PLEASE submit a bug report to http://llvm.org/bugs/ and include the crash backtrace, preprocessed source, and associated run script.
clang: note: diagnostic msg:
********************
PLEASE ATTACH THE FOLLOWING FILES TO THE BUG REPORT:
Preprocessed source(s) and associated run script(s) are located at:
clang: note: diagnostic msg: /tmp/main-df5301.c
clang: note: diagnostic msg: /tmp/main-df5301.sh
...
Here's another similar case that will send my version of clang into (what appear to be) an infinite loop:
int main(void){
int i = 0;
switch(i){
i = i + ({case 0:; 0;});
}
return 0;
}
While you're at it, why not sneak a case statement label into the inside of a bitfield width calculation. This example will compile in clang, but it won't output anything (the case 0 seems to get completely ignored and 0 will be caught by a default case if you add it):
#include <stdio.h>
/* This compiles in clang 3.8.0-2ubuntu4, but it doesn't output anything. */
int main(void){
int i = 0;
switch(i){
int j = sizeof(struct {int i:({case 0:; 1;});});
printf("Fin %d.\n", j);
}
return 0;
}
You get similar behaviour if you put a case label inside another case label:
#include <stdio.h>
int main(void){
int i = 1;
switch(i){
case ({case 1:; 0;}): printf("here\n");
}
return 0;
}
Conclusion
As you've seen in this article, you can use switch statements to make a number of valid C programs which are extremely difficult to understand. You can even take this further by embedding case labels inside statement expressions which produce truly next-level hard to understand code that can cause a variety of subtle problems. As we've seen above, this includes compiler crashes, compiler hangs, and subtly broken executable code. If you commit enough of these to your code base, it is sure to get you fired!
The Jim Roskind C and C++ Grammars
Published 2018-02-15 |
$40.00 CAD |
7 Scandalous Weird Old Things About The C Preprocessor
Published 2015-09-20 |
Building A C Compiler Type System - Part 1: The Formidable Declarator
Published 2016-07-07 |
Modelling C Structs And Typedefs At Parse Time
Published 2017-03-30 |
Building A C Compiler Type System - Part 2: A Canonical Type Representation
Published 2016-07-21 |
The Magical World of Structs, Typedefs, and Scoping
Published 2016-05-09 |
An Artisan Guide to Building Broken C Parsers
Published 2017-03-30 |
Join My Mailing List Privacy Policy |
Why Bother Subscribing?
|