The Angry EE: March 2010

Tuesday, March 9, 2010

New Unoriginal Idea

I don't think that I learn much at my job, but I was surprised the other day to learn something new about microcontrollers. Well, coding in general anyhow. You see, when working with microcontrollers I use a finite state machine to control program execution. No, wait, what does that mean? That's technical gobbledy gook. It means that I have a variable and I define a bunch of different values it can take. When my program starts it goes into INITIALIZE mode and it initializes stuff. Then when its done I usually put it into IDLE where it doesn't do much of anything and usually waits for user input like a button press or something. Then depending on what happens I put it into other modes like DO_STUFF_1 or DO_STUFF_2. Different events can change the mode it's in and that produces different behaviors. The idea is that it will be a nice and orderly flow from one mode to the next and that depending on what mode you're in you won't handle certain inputs, or you'll handle them differently. The goal is predictable behavior.

But microcontrollers are anything but predictable. You see, there's these awful things called interrupts. They are what they sound like - they interrupt your program and do something else. They happen usually when hardware has done something - say a button has been pressed. When that happens, whatever you were doing previously stops and this new code executes. That's not usually so bad, but you almost always want to change the mode you're in based on what happens in that interrupt. Okay, so change the mode. Fine, whatever. I don't care.

Except I obviously do. This might happen:

CODE ALERT! CODE ALERT!

do_special_stuff_for_different(modes)
{
mode1:
easy_command();
easy_command();
---OOOOPS INTERRUPT HAPPENED HERE!----
change_mode_to(mode2)
---INTERRUPT OVER BYE BYE NOW!--------
command_that_checks_mode(); //Oops, doesn't run, wrong mode!
command_that_depends_on_the_above(); //fails, above didnt' run!
...
} //Always close your braces!

You see what happens there. Ideally, you want all the commands for a mode to run regardless of whether the mode changes halfway through.

This can be done. Just use a queue data structure to change the mode. A queue is a list of things. I can put things into the back end and take things out of the front. So when I want the mode to change, I can just put the new value for mode into the queue and then handle it in one and only one place in the program:

is(empty(mode_queue))
{
no:
mode =new_mode_from_the(mode_queue);
yes:
//Do nothing!
}
do_special_stuff_for_different(modes)
{
mode1:
easy_command();
...
}

So you only change the mode before you handle events for that mode, and never while you're handling them. This eliminates confusion for the poor code.

But, my unoriginal idea improves this - I think. Say that you're in MODE1 and you have an interrupt that wants to change the mode to MODE2. Ok, it puts the new mode into the queue and we'll handle it when the time comes. But now, before it can be handled you get another interrupt that tries to change the mode to MODE3. Well, put it in the queue and we'll handle it. When everything is said and done, you have the mode change from MODE1 to MODE2 to MODE3 right after one another. It seems OK, but wait! When we designed the system, we never intended to move from MODE2 to MODE3 because in MODE2, we set things up to get ready for MODE4 which we intend to follow MODE2 directly. But that doesn't happen. When you're in MODE2 you try to put MODE4 on the queue, but it gets in line behind MODE3. And meanwhile, if MODE3 attempts to put MODE5 next, it also fails. So you've got two different things trying to happen in sequence at the same time.

It might work, but it might not. We intended in our design to move right from MODE2 to MODE4 but we didn't do anything to ensure that it actually would. That's where my unoriginal idea comes in. These modes are stored on a microcontroller. That means they're stored with bytes, and bytes are made of bits - eight of them to be precise. Eight bits can store 256 different modes if you encode it that way, or it can store eight if you use something called one-hot encoding. One-hot encoding means that only one bit is ever a '1' at a time - the rest are zeros. Wherever the '1' is determines what the value is.

So, when we want to change modes we don't put the next mode we want to move to into the queue, we XOR the current mode with the next mode and put that into the queue. The XOR operation works on bits and it says if two bits are the same, the outcome is zero. If they're different, the outcome is 1. Since all of our modes are encoded with one-hot encoding, there shouldn't be any same bits. Wherever there was a one in either of the modes' bytes, there will be a one in the resulted XOR'd byte. Using this method we don't only record the next mode we want to go to, but we record the mode the program was in when it wanted to go there. It says 'I want to go from MODE2 to MODE4' instead of 'I want to go to MODE4'. Then, when it comes time to look at the queue and change the mode, we can make a better decision. If the request is 'I want to go from MODE2 to MODE4' but we're in MODE3 right now, it doesn't happen. It goes on to the next request which was 'I want to go from MODE3 to MODE5' and that one happens.

Now, there's an obvious flaw in this: the request to go to MODE4 is lost totally. The upshot is that if I press two buttons at the same time only one thing happens. For buttons that's good enough, but for other inputs it's not. For some inputs what you really mean to say is 'No matter what mode I'm in now, I want to go to MODE8'. How do you do that? Simple, it's.. No, wait, how DO you do that?

Oh, wait, I know how. Assuming you followed all of the above steps, then when you XOR the information in the queue with the current mode you get out a valid new mode and just switch to that. If you don't, then you can assume that you want to perform one of these unequivocal mode changes. You just have to figure out what mode you want to change to. So, in order to handle that, you can do this: instead of putting the XOR'd mode into the queue, just put the next mode you want to go to. When the program attempts to XOR it again with the current mode it WON'T get a valid mode out of it. So, it will XOR the data again with the current mode and if you meant to do one of those unequivocal jumps, it will produce a valid mode from THAT XOR operation. If you didn't mean to do an unequivocal jump, then it will produce an invalid mode. Tada!

Monday, March 8, 2010

Debugging

This post isn't for me. I never have to debug. I follow a strict development process in which all of the documentation I read is perfectly accurate and describes all edge case behavior. I also follow code writing procedures so well that every exception is handled, every input is sanitized completely (even to the point of predicting accurately what the user meant to enter instead of what the user did) and every return value is checked. All reported errors (which are entirely due to... well, not me anyhow) have expository messages bordering on small novels that not only tell you what caused the error but actually fix it themselves in all known copies of the program, including those in the past. Yeah, that's right - my code is so awesome that it goes back in time and fixes itself before you even see an error message.

Well, I doubt you believe that. I'm not sure what to believe anymore - the full truth of the situation is caught somewhere in an unstable time loop that will someday tear reality apart. So in the meantime, I actually debug things. Yes, things that I created. This is because you can never get away from debugging. Because documentation is Just. Plain. Wrong. Because the user put international unicode characters that you didn't even know existed into your text box. Because every library you rely on has bugs and undefined behavior. And it's only worse with circuits: the EM spectrum is effectively infinite. Wavelengths you only see in your nightmares can ruin your life in an instant. It happened with the earliest digital computers - freaking cosmic rays would cause a single bit in memory to be changed and trash your machine state. Yes, the very fabric of nature hates you and wants to give you blue screens.

The best, the very best that you can hope for is that your code or circuit does what you designed it to do. Put all your effort into meeting the spec first - later, after you're almost done debugging and trying to figure out why the slave device won't respond to a perfectly valid message, well, that's when they'll tell you that the specs they gave you were wrong and you have to wipe out some code and start over.

In any case, when all you have is a spec, all you can do is implement it. Let's imagine a scenario that involves both hardware and software. I'll imagine a small embedded sensor that has to relay data to its gateway - probably an AVR based 8-bit microcontroller with firmware that I've made, communicating with a host system over TTL-level serial (8-N-1, 115200 bps) and it collects 0-5V data with the ADC. It uses a packet-based serial communication scheme (START, MSG ID, LENGTH, DATA, END sort of deal) to transmit data every second to the host.

Ideally, you code everything, solder everything, hook up power and communication and it works. Then you go home to your supermodel wife/husband/plaything and sports car and sip scotch while listening to death metal. But the reality is that 99.9999% of the time it won't work. The questions you have to keep asking yourself:

'What might be wrong if it doesn't work?'
'What's the quickest way to test that?'

Before I begin, let me note that I said 'quickest' way to test things. When you think you're done and you're testing your code or hardware the actual system it's meant to interface and work with, then chances are your time is limited. You'll probably be close to delivery of the final product, or you'll have another group that needs to use the actual system to test. In either case, quickest is best - you want to front-load as much of this integration phase as possible into design as possible. Don't ever assume it's going to work and then attempt to throw debug capability into your design after you amazingly find that it doesn't 'just work'.

So, look at the host computer and see what it says. No communication with slave?

'What might be wrong if that doesn't work?'

You should know why specifically it shows that message - in this case let's say that it shows it because it didn't receive an answer to an initialization message. Maybe it's just not getting the message

'What is the quickest way to test that?'

I think the easiest way is to put an LED on my slave device and light it if it finds the message. So we'll create the light and lo and behold it doesn't light.

'What might be wrong if it doesn't light?'

If it doesn't light, then the serial port may not be working correctly.

'What's the quickest way to test that?'

I would create a wrap-around self-test mode. Just loop the TX lines of your slave to RX and send a test message, then light an LED if it receives what it sent out.

'What might be wrong if THAT light doesn't light up?'

At this point, if it can't communicate with itself then it's just not sending anything out (for one reason or another) or what it's sending out is gibberish and can't be read back.

'What's the quickest way to test that?'

Monitor the signal to see if it looks like a legitimate serial transmission. To do that, you'll need test points to hook up your logic analyzer or oscilloscope. So you'll either see a valid waveform, an invalid waveform, or no waveform.

So ask yourself again: 'What might be wrong if I see valid waveform or no waveform?'

If you see a valid or no waveform waveform then it's probably software: the receive or transmit buffer isn't being filled perhaps.

And then: 'What's the quickest way of testing that?'

What *I* would do is use a secondary serial port as a debug terminal so you can view the contents of the RX and TX serial buffers. Then you can see if they're being filled or not, and go on from there.

So look back at all that. Just by considering what might not work we added several things to our system: test points, a wrap-around self-test functionality and a debug terminal. If you have all of those things ready and you've tackled some of the other 'Why might this not work and what can I do to quickly test it?' scenarios then you're poised to quickly find the root of the problems you're seeing in minutes and hours instead of days and weeks. Integration (that is, combining what YOU make with what OTHER people make) is the most difficult part of any project. Specs aren't going to help you because no spec is ever complete enough to define a system. Everyone makes assumptions and doesn't tell anyone and that makes testing a pain. So save yourself some pain and plan ahead. You'll thank yourself later.

Tuesday, March 2, 2010

Wherefore Art Thou Sparkfun?

Did you know there's an angyee.blogspot.com? I think it's in Portuguese and run by a female. It's all pink anyhow, but pink can be manly, so I don't know. It might help if I read Portuguese.

I just read an interesting flame war over on Hackaday about whether Sparkfun is overpriced and, well, dumb. Overpriced is for certain, but that's the name of the game. Everyone who sells you something they bought from someone else will be 'overpriced' in that you don't have the money or storage to buy 1000 of anything and wait for others to buy them from you. You're buying the convenience of getting one and only one of something.

In fact, I just bought some ATMega328 DIP chips from Sparkfun. After the flame war I was feeling guilty about it so I checked DigiKey for their price. It's the same. The exact same. The only difference is that Sparkfun offers less of a volume discount. Their stock counter says they have over a thousand of them, so they got the nice $3.04/unit price and are selling them at $4.30 - just like DigiKey. Yes, they're making money, but why not just buy from DigiKey?

That brings me to the second thing that you get from Sparkfun - an explanation. When you search for ATMega328 on DigiKey you get five different chips in varying packages and a comparison table that's about a half mile long. If you don't know that you want the DIP version you're screwed. Lucky me, I do know.

But this totally avoids a more important point: How do you know you want an ATMega328 at all? Sparkfun spells it out for you: it's the Arduino chip but with more program memory. More is better right? Even if you don't know how much program memory you need, why not have more? Plus they tell you that it will work in place of the chip already in your Arduino but wait, don't change that chip yet! You need the bootloader programmed on there for everything to work properly. DigiKey would not tell you that. Heck, DigiKey will barely make it easy for you to give them money.

That's another thing Sparkfun has going for it: their web design is not straight out of 1995. It has pictures and a shopping cart and colors! Oh, and navigation. DigiKey has an awful Flash version of their print catalog. Yes, they thought it would be good to simulate turning pages on the internet. That hasn't been a good idea since Geocities. Of course, DigiKey started with its paper catalog and must really feel attached to it. I can't fault them for sticking with success, but the idea of even using a 1000 page catalog is... so, so unappealing. And a waste of paper.

So the second point is whether Sparkfun is dumb. And by dumb I mean facile. Childish. Dorky but in a bad way. The issue at hand was creating projects like '5 foot Nintendo controllers'. I must admit that so much of the hobby electronics seems like a big dick waving contest. Who can create the coolest steampunk doo-dad? I don't care. Or this gem recently featured on Hacknmod: 'Twitter Controlled LED IKEA Table using Bluetooth'

Let me just pause a minute to let that sink in.

...

...

Okay. Bluetooth is neat. My Wiimotes are Bluetooth and I think it's great I can hook them up to my computer and read their data. It makes a lot of sense as a Personal Area Network protocol. I like headsets. So I don't dislike Bluetooth. I also think RGB leds are interesting - with Red, Green and Blue we can make all colors! It'd be great for use as a status indicator - red is bad, green is good, orange is warning, etc. I like... coffee tables? I guess. I don't dislike them anyhow.

But I just picture this geek sitting at this table with his friends, itching to show off his new toy that he made. So in the middle of the conversation the whips out his phone and says 'Watch, I'm going to Twitter to my table and change its color! I made this! Hold on. I have to type out the message... almost there... Oh I don't have 3G, this might take a bit... Ok sent! Now it should turn green.. uh. any second... uh.. Maybe I have to reboot the computer... hold on...'

Or you have worse projects like TV-B-Gone. The sole purpose of that invention is to make yourself feel good at others' expense. In a bar? Football game that everyone else is watching too loud? Can't hear you and your friends talk about how counter-culture you are? Whip out your TV-B-Gone and turn off everyone else's entertainment! That TV wasn't doing anything for you!

So a lot of this hobby electronics community consists of everyone trying to outdo this month's steampunk widget or Twitter-using Ardunio-controlled status symbol. You know where it goes from here: What started as an honest effort to educate people, have a good time and share things turns into a contest. I saw it happen with LAN parties. Instead of having fun and laughs people started the biggest hard drive contest, or the most porn contest or the biggest monitor/best system specs contest. My friends and I held LAN parties in high school. Some we charged for. Despite lugging their system across town and PAYING to get in some people just wouldn't play computer games! They'd hang around, look over people's shoulders and slow down peoples computers by transferring porn while they were trying to play Unreal Tournament. It just wasn't worth it after a while and I fear the same thing will happen with this 'community'.

In conclusion: Overpriced - probably for some things, not for others. Do your homework. Facile? Childish? Lacking in true content? A bit. There's still plenty of good stuff over at Sparkfun and their top-selling items are components - not kits. But I worry about everyone else. When blogs just echo each others' postings of Twitter-controlled steampunk widgets instead of producing something... substantive, well, I get concerned.

The Angry EE