The Angry EE: Debugging

This post isn't for me. I never have to debug. I follow a strict development process in which all of the documentation I read is perfectly accurate and describes all edge case behavior. I also follow code writing procedures so well that every exception is handled, every input is sanitized completely (even to the point of predicting accurately what the user meant to enter instead of what the user did) and every return value is checked. All reported errors (which are entirely due to... well, not me anyhow) have expository messages bordering on small novels that not only tell you what caused the error but actually fix it themselves in all known copies of the program, including those in the past. Yeah, that's right - my code is so awesome that it goes back in time and fixes itself before you even see an error message.

Well, I doubt you believe that. I'm not sure what to believe anymore - the full truth of the situation is caught somewhere in an unstable time loop that will someday tear reality apart. So in the meantime, I actually debug things. Yes, things that I created. This is because you can never get away from debugging. Because documentation is Just. Plain. Wrong. Because the user put international unicode characters that you didn't even know existed into your text box. Because every library you rely on has bugs and undefined behavior. And it's only worse with circuits: the EM spectrum is effectively infinite. Wavelengths you only see in your nightmares can ruin your life in an instant. It happened with the earliest digital computers - freaking cosmic rays would cause a single bit in memory to be changed and trash your machine state. Yes, the very fabric of nature hates you and wants to give you blue screens.

The best, the very best that you can hope for is that your code or circuit does what you designed it to do. Put all your effort into meeting the spec first - later, after you're almost done debugging and trying to figure out why the slave device won't respond to a perfectly valid message, well, that's when they'll tell you that the specs they gave you were wrong and you have to wipe out some code and start over.

In any case, when all you have is a spec, all you can do is implement it. Let's imagine a scenario that involves both hardware and software. I'll imagine a small embedded sensor that has to relay data to its gateway - probably an AVR based 8-bit microcontroller with firmware that I've made, communicating with a host system over TTL-level serial (8-N-1, 115200 bps) and it collects 0-5V data with the ADC. It uses a packet-based serial communication scheme (START, MSG ID, LENGTH, DATA, END sort of deal) to transmit data every second to the host.

Ideally, you code everything, solder everything, hook up power and communication and it works. Then you go home to your supermodel wife/husband/plaything and sports car and sip scotch while listening to death metal. But the reality is that 99.9999% of the time it won't work. The questions you have to keep asking yourself:

'What might be wrong if it doesn't work?'
'What's the quickest way to test that?'

Before I begin, let me note that I said 'quickest' way to test things. When you think you're done and you're testing your code or hardware the actual system it's meant to interface and work with, then chances are your time is limited. You'll probably be close to delivery of the final product, or you'll have another group that needs to use the actual system to test. In either case, quickest is best - you want to front-load as much of this integration phase as possible into design as possible. Don't ever assume it's going to work and then attempt to throw debug capability into your design after you amazingly find that it doesn't 'just work'.

So, look at the host computer and see what it says. No communication with slave?

'What might be wrong if that doesn't work?'

You should know why specifically it shows that message - in this case let's say that it shows it because it didn't receive an answer to an initialization message. Maybe it's just not getting the message

'What is the quickest way to test that?'

I think the easiest way is to put an LED on my slave device and light it if it finds the message. So we'll create the light and lo and behold it doesn't light.

'What might be wrong if it doesn't light?'

If it doesn't light, then the serial port may not be working correctly.

'What's the quickest way to test that?'

I would create a wrap-around self-test mode. Just loop the TX lines of your slave to RX and send a test message, then light an LED if it receives what it sent out.

'What might be wrong if THAT light doesn't light up?'

At this point, if it can't communicate with itself then it's just not sending anything out (for one reason or another) or what it's sending out is gibberish and can't be read back.

'What's the quickest way to test that?'

Monitor the signal to see if it looks like a legitimate serial transmission. To do that, you'll need test points to hook up your logic analyzer or oscilloscope. So you'll either see a valid waveform, an invalid waveform, or no waveform.

So ask yourself again: 'What might be wrong if I see valid waveform or no waveform?'

If you see a valid or no waveform waveform then it's probably software: the receive or transmit buffer isn't being filled perhaps.

And then: 'What's the quickest way of testing that?'

What *I* would do is use a secondary serial port as a debug terminal so you can view the contents of the RX and TX serial buffers. Then you can see if they're being filled or not, and go on from there.

So look back at all that. Just by considering what might not work we added several things to our system: test points, a wrap-around self-test functionality and a debug terminal. If you have all of those things ready and you've tackled some of the other 'Why might this not work and what can I do to quickly test it?' scenarios then you're poised to quickly find the root of the problems you're seeing in minutes and hours instead of days and weeks. Integration (that is, combining what YOU make with what OTHER people make) is the most difficult part of any project. Specs aren't going to help you because no spec is ever complete enough to define a system. Everyone makes assumptions and doesn't tell anyone and that makes testing a pain. So save yourself some pain and plan ahead. You'll thank yourself later.

The Angry EE

Monday, March 8, 2010

Debugging

No comments:

Blog Archive

Followers