Thursday, February 17, 2011

C Gotchas

Holy crap, two updates in one day? Yeah... I decided that if I find myself sitting at my computer thinking 'I wonder what's on Slashdot' or 'Do I have any more street cred over at Chiphacker?' I decided I should do something potentially useful and update my blog. Doesn't have to be long, doesn't have to be good, dosen't ahve to ahve correct spellign - just do it. After all, the first step to making money with a blog is to update it every day. Second step? Have people actually read it. Step three profit baby!

But I'm sure you're all here for the real meat of this post and that is going to be the answer to the question 'What stupid thing did Steve do today that cost him hours of time?' It didn't take me hours, but here's a snippet of code that caused me trouble. By the way - you win $10,000 if you spot the bug and submit it before I hit 'Publish' for this post.

#define VALUE 0xF01FUL

short i = 0xF01F;

if( VALUE == i)
{
print("They are equal\r\n");
}
else
{
printf("They are not equal fool\r\n");
}


Ok hotshot, what prints?

If you said 'They are equal' you are in fact wrong. I hope that feels good. But do you know why you are wrong? Here comes the science.

We have two things being compared here: i is a short int. On most processors/architectures that is a 16-bit signed integer. The #defined value is in hex (obviously) and supposedly the same value as the variable but has a little 'UL' on the end of it. That signifies that it is to be treated as an Unsigned Long variable. This corresponds to the unsigned int (32 bit) type on most processors/architectures.

That might already give you the first inkling of why these two aren't equal: one is 32 bit but the other is 16. But you veterans out there (if you consider having taken an introductory C class in college being a veteran) will think 'Ah, but those bottom 16 bits are the same, so it shouldn't matter!'. You would be right, but C doesn't follow your rules. In C all integer comparisons are done on a 32-bit basis. Basically, C expands every integer used in a comparison to 32 bits to determine if they're the same.

'Aha!' you say with a sly smile, 'I was right then! Even if you expand them both to 32 bits they're padded with 0's in the upper unused portions (obviously!) so they come out to the same thing!'

But once again you're incorrect in your assumptions. Why oh why do you assume that they're padded with 0's? Because the only other option is to pad them with one's and that would change the value? You obviously forgot how signed data is represented on a computer! In signed integer types in C the most significant bit is the sign bit - if it's 1 then it's negative. Seems simple enough. So let's follow your line of thinking and expand our (signed) variable to 32 bits:

16-bit value: 0xF01F
Expanded to 32 bits: 0x0000F01F

Wait a darn tootin' second! This was a negative number (the most significant byte was F which means all 1's which means the most significant bit was 1 - which means negative). Now that we've expanded it it's suddenly.... not negative. Well that can't be. It's not the same value then - positive vs. negative. Kind of a big change. So to preserve the value we'd have to expand it and pad with 1's - like this:

16 bit value: 0xF01F
32 bit value: 0xFFFFF01F

Let's check my math with a signed integer calculator that you can find online (via the Google: http://planetcalc.com/747/):

0xF01F: -4,065
0xFFFF01F: -4065
0x0000F01F: 61471

Yep... padding with 0's doesn't produce the same result if the integer is defined as signed. So where you see

if(0xF01F == 0xF01F)


C sees:

if(0x0000F01F == 0xFFFFF01F)


And then it looks at you funny for thinking they're the same.

But I don't look at you funny. I'm only so dismissive and rude because I just made this mistake today and the pain is still fresh. Someday we'll laugh about this.

But for now if you mention it I will end you.

No comments: