The Angry EE: September 2012

I try to write tight, efficient code. This is very important when you're working on a small microcontroller with limitations on speed and memory. However, I also try to write code that is well-organized and safe (ie, preventing buffer overflows, recognizing and handling error conditions, etc.). Let me tell you that these two goals are often at odds: code suddenly becomes much less efficient when you have to constantly check for errors ("Is this passed pointer NULL? Am I outside the bounds of my array? Did I leave the oven on?"). Trying to write well-organized code just makes it worse - to achieve this I usually separate all functionality out into independent functions. Pretty soon you have a class with 18 functions and the call stack for any given publicly-visible function might be five or six calls deep. Say hello to stack overflows!

It's an eternal struggle with no good solution. And since I don't have a solution, I'll just shut up and go to bed. Good night!

Just kidding - you can't get me to be quiet so easily. While there is no universal answer to the problems posed above there are application-specific solutions that a good engineer can utilize to improve efficiency, readability and robustness. Choosing the correct data structure for your task gives you improvements on all fronts with no downsides. Let's discuss choosing between a circular buffer and a linked-list based on the application.

(As always you can look on Wikipedia to find a good basic description of Linked Lists and Circular Buffers to supplement my descriptions).

A circular buffer is a way to turn a linear array into a circular one. Say you have an array of bytes and it's 32 bytes long. These arrays are linear in that people think about them having a beginning and an end: the beginning is always 0 and the end is always size-1. Typically programmers will use a for loop to iterate over the whole range of the array from the first element to the last or perhaps use an enumeration to access specific elements in the array. Circular buffers are conceptually different in that they have no 'beginning' or 'end' but instead a 'head' and 'tail'. The head and tail can be any position within the array. When the head or tail reaches the last element they wrap around back to the beginning. Elements can be added to the bufer and removed during the course of program execution and the head and tail values change to match. In this way the buffer keeps track of which parts of the array are unread and which can be overwritten. Here's an example:

I instantiate a circular buffer with room for 32 elements. At the beginning the buffer is empty, so the head and tail pointers both set to element zero. At some point I want to add data to the buffer. Data is always written to the tail and read from the head, so when I add data to the buffer it's written to the current position of the tail (element zero) and then the tail is incremented to now point to element one. I can keep adding data to the buffer and the tail will continue to increment until it's equal to 31. At this point I've added 31 elements to the buffer and read none, so the head pointer is still set to zero but the tail pointer is set to 31. If I incremented the tail pointer again it would be 32 and my array doesn't have 32 elements and that would be a buffer overflow. So instead of just incrementing the tail pointer I have to increment it then wrap it back around to point to the beginning of the array. So (31+1)mod 32 = 0. Doing that would cause the head and the tail to point to the same location. By convention I've decided that if the head pointer and the tail pointer are equal that means the buffer is empty, not full. So I've added 31 elements and I can add no more without disturbing my convention. In order to add more I have to remove some elements. As I said before elements are always read from the head pointer. So in order to remove an element I read the element pointed to by the head pointer and then increment the head pointer. Now the head points to position 1 and I can add another element at which point the tail will be equal to 0. As long as you maintain the state of the head and tail pointers you can reuse the same memory over and over again and hopefully not get too confused.

Circular buffers are very useful for serial ports. Imagine for example you have a microcontroller and you need to monitor a serial port. You have an RX interrupt that tells you when a byte comes in. You have two options: you can handle the byte in the interrupt (which is not always a good idea depending on the amount of 'handling' you have to do) or you can just add that byte to a circular buffer and return. Then in your main loop you just check to see if there's any data in the circular buffer and handle it there outside of the interrupt context.

I coded a nice circular buffer library with a focus on robustness and code clarity over efficiency. This meant that I had a function to add a byte, remove a byte, increment pointers, copy multiple bytes, etc. It ended up being a lot of functions and a lot of overhead: every time you wanted to add a byte it would check to see if the buffer was empty (one function call) then add the byte and call another function to increment pointers (two function calls, AH AH AH!). This was probably more function calls than I should have tolerated but the performance was acceptable for its intended application of handling serial traffic. It was so useful that when I needed a packet-based serial scheme I coded it utilizing my circular buffer as a basis: serial data would be pushed into the buffer and the packet communication functions would read the buffer looking for packets. Stacking these two libraries on top of each other caused more function calls and more overhead but it still wasn't unbearable because this arrangement was a pretty good fit for this application.

But one fateful day I needed to handle TCP and UDP traffic instead of serial traffic. TCP and UDP communication doesn't piddle around with single bytes but instead can transfer hundreds of bytes at a time inside a packet. At the time I foolishly thought that code reuse was a higher priority than efficiency so I implemented the same scheme for TCP and UDP data as I did for serial. That meant that for every packet that came in called a function to copy one byte at a time into a circular buffer. And of course, that function called other functions which called other functions and suddenly my overhead just for copying data to the buffer was rather large. But wait, it gets worse. Even more foolish than copying bytes singly is copying them at all. You see, the TCP/IP stack didn't intend for me to copy that data to another buffer - it passed a pointer to the data and the length of the data and assumed I would work with it in-place without copying. But my circular buffer didn't work that way so instead I had to instantiate a huge circular buffer capable of holding multiple packets (we're talking a thousand bytes or so) and copy data into it one byte at a time at the cost of multiple function calls per byte when it was entirely unnecessary to do any of that!

As you can see I was a fool. This system was not efficient in any way. It doubled the amount of memory I needed and made many useless function calls. There's no way this system could handle large amounts of data and sure enough my first attempt to handle TCP traffic produced an astounding data rate of 8kbps at the application layer. This was over Ethernet by the way.

I failed to produce efficient code because the data structure I chose was totally unsuited for the task. When I used my circular buffer for the serial port I gained the advantages of robust and safe code with a minimum loss in efficiency. When I tried to apply the same code to a different system I lost a significant amount of efficiency with no increase in robustness or safety. Different situations call for different solutions. What should I have implemented to better handle TCP/IP traffic?

In this situation a linked list would have been preferable. Linked lists can perform many of the same functions as circular buffers but they arrange the data differently. Imagine a linked list as a bunch of Easter eggs connected by pieces of string. Each Easter egg has yummy candy inside of it. There's a string on one end of the egg that's secured so you can't remove it. On the other end of the egg is a place to tie another piece of string. In this example you're blind so you're doing everything by touch. You need to keep track of all of your eggs and not lose any. The first thing you do is secure one egg to the table and call it your 'head' egg. Unless you misplace the table it's unlikely you'll misplace the head egg. So someone (we'll call him Mr. TCP/IP Stack) hands you another egg and in order not to lose it you tie it to the ball that's secured to the table. When he hands you more, you just follow the string of eggs until you find the last one and tie the new egg there. You can keep adding to the string of eggs this way and be sure you'll never lose any because they're all secured to a single immovable point. At some point another person (Mr. Packet Parser) will ask for the first egg (called egg #1 hereafter) Mr. Stack gave you. Here's how you do it without losing all of your egg:

Find the head egg
Move down the string of eggs until you find the egg after egg #1(call this egg #2)
Untie egg #2's string from egg #1 and keep ahold of it
Untie egg #1's string from the head egg
Tie egg #2's string to the head egg
Hand off egg #1 to Mr. Parser

As you can see this approach allows you to keep an unlimited number of eggs and give any particular egg away without being worried that you'll lose the others. This is the idea behind a linked list except that the strings holding the eggs together are pointers and the goodies inside the eggs are the payload. The TCP/IP stack would hand me the payload it received, but instead of putting the goodies into an egg I picked them apart one by one and put them in one of those pill boxes where there's a compartment for every day of the month. Thus, Mr. Parser had to open each compartment individually one eat one piece of candy at a time rather than opening the egg and dumping it all into his mouth.

The difference between the two approaches is striking. With the linked list approach all I would have had to do when I received new data from the TCP/IP stack is to add the payload to the linked list. The performance of the circular buffer approach was dependent on the amount of data received where a linked list doesn't have that restriction. The linked list approach also reduces memory usage: there's no reason to copy all of the data into one large buffer to search through it. With TCP and UDP unless something goes horribly horribly wrong you won't receive half of your data in one packet and half in another - it's all or nothing. This means that you don't have to maintain two large buffers - just one. In addition with the linked list there's no overhead in maintaining the state of the circular buffer. This means fewer function calls compared to the circular buffer. For this application, using a linked list is superior to the circular buffer because it reduces memory usage, scales better and has lower overhead.

But wait! I don't want your takeaway from this post to be that the linked list is superior to the circular buffer in an absolute sense. That's not true at all - the takeaway from this post should be that you have to choose your data structure to suit your application. Let's take a backwards look at this and see what would happen if we used the linked list in place of the circular buffer for the application of receiving serial data.

You see, the circular buffer was fine for the serial application because it was designed to handle data that came in singly. All it had to do to add data was stick the new byte into the array at the location of the tail pointer and then increment that pointer. Incrementing the pointer is the hardest part and even that was fairly simple. The linked list is different. To add something to the end of the linked list first you have to find the end. This means starting at the head and working your way down until you find an egg that doesn't have another one tied to it. And you have to do this every time you want to find the end. In this serial application data comes in one byte at a time. That means each egg only contains one jelly bean. The efficiency of the linked list was due to the fact that you could put a lot of jelly beans in each egg. But if you're artificially limited as to the number of jelly beans you can put in each egg suddenly a linked list isn't very efficient. As the list gets larger you have to search longer and longer to find the end of the chain. And that list gets larger and larger with every byte that comes in. Not every ten bytes or every 100 bytes - every single byte. That gets very inefficient very quick.

You can see that simply by choosing the correct data structure for your application you can improve performance without sacrificing robustness, stability or readability. By choosing your data structures wisely you can have your cake and eat it too!

The Angry EE

Saturday, September 15, 2012

Choosing data structures

Blog Archive

Followers