Thursday, March 8, 2012

An In-Depth Look

I had someone ask to share the loops I was talking about in the earlier post for analysis.  Ask and ye shall receive!  Let's start with the loop that performed awfully.

Stare into the belly of the beast...
Here's a quick walkthrough (roughly from left to right):
  1. Load a text file as a spreadsheet with a break on all newline characters - this gives me all of the lines in the file in an array of strings (there's probably an easier way but this was most direct at the time.
  2. The array of strings is fed into the first FOR loop.  The loop is auto-indexed on the array of lines.
  3. Within the FOR loop there's a case statement that handles different lines in the text file differently.  There's header rows and such at the beginning so basically the case statement allows the loop to ignore those and start with the data rows.  
  4. Inside the case statement there's a second FOR loop - this one loops over each field in the line.  It is indexed by the array of integers.  Each row of that integer constant contains offsets and lengths for a string subset call. (More on this in a second)
  5. Inside the second FOR loop there's a case statement that has a case for each field.  Every field except the last is handled the same.
  6. Inside the case statement there's a string subset command fed with values from the constant integer array.  These values provide offset and length for each string subset call except for the last.  The last is the special case mentioned above - there's no length associated with the last call to string subset (not wiring the length causes it to read until the end of the line).
  7. When each field is picked off, that text replaces one of the elements in the constant string array being fed into the loops near the bottom.  The string array has an element for each field in each line, and the integer constant contains the array element within that string array to replace with the current field.
  8. When a full line has been read and turned into an array of strings, that array of strings is appended to the list of lines already read.  
  9. The loops then continue with the next line.
 I hope most of you can already guess what the main problem is so far.  It's the 'this loop inside this loop inside this case inside this loop inside....' yada yada ya. That's bad - for speed anyhow.  At first I tend to try to make my code.. shall we say, robust?  Able to handle even impossible situations?  I also try to make it clear, and clear means verbose or at least, doing much more than you need to to promote uniformity and centralization of data.  This often leads to unreasonable performance through over-engineering.

Let me give you an example -  I'll bet something like this has happened to you.  Let's say you've got a program that communicates with another system.  For what you're programming, you have to change your behavior depending on what commands the other system sends you. So as a hacker, you start hacking.  You read the first command you need to implement:
"Set mode to passive - in this mode your system will only respond to messages sent to it by the external system."  
Ah, easy!  I'll just make a boolean that tells me whether I've received that message!  Then I can read that boolean and determine whether or not to send messages automatically.  Great.  Second message:
"Set mode to active - in this mode your system will automatically send the following messages at these rates:..." 
Perfect - when you receive this just toggle the same boolean from before to set yourself back to automatic mode.  Third message:
"Set response delay to minimum - when you receive this message delay all automatically-sent messages by one frame."  
Okay... so I'll make a boolean for this one too?
"Set response delay to maximum: When you receive this message delay all automatic messages by three frames." 
Okay, okay, no, wait, I should make that last one an integer or something and not a boolean, I'll just go back and change that...

And right about here my bat-sense starts tingling.  "You're going to be doing this forever!  Use a state machine!  Otherwise you'll have a million variables and you won't be able to keep track of what combination of booleans and integers defines what state and what behavior!  STATE MACHINE!"  And a state machine sounds like a good idea - it allows you to be more rote, more design-oriented, more organized and more certain that your system works.  Implemented properly they improve deterministic behavior in complex systems.  So I add a state machine and start defining all the states I'll need: automatic, passive, umm, delay?  No, so far it's just two states but I'm sure there will be more.  Let's read the next requirement:

"....."

Ah, so we're done are we?  So I rigged up a whole state machine architecture when all I really needed was one boolean and one integer.  Sure, a state machine will WORK for this but it's certainly not efficient.  It needs memory, time, special coding practices, etc.  A boolean will do the same job much more efficiently.

And that's my problem - that's why I build loops inside of cases inside of loops (yo dawg, I herd u liek loops....). I think I'm doing something to make my programs deterministic and robust.  I'm actually making my programs robust, deterministic snails.  Like a giant snail-tank that's crushes anything in its path and is always on-time.  I don't think anyone wants that.  Unless that's what you ride to work every... month?

After much profiling of various methods of implementing this functionality, I settled on the below implementation:
So fresh and so clean!

Yeah - look at that.  ONE LOOP!  Seven calls to string subset (running in parallel) and no Build Array blocks!  Loading a 48000 line file used to take an hour.  Now it takes two seconds.  So how did I manage to rid myself of all of that cruft?

  1. First, I stripped off the header rows from the array containing each line.  That removed the need for a case statement inside of the first FOR loop (the one that ignored the header rows).  
  2. Then I removed the inner loop that looped over each field and just put all of the string subset calls in the same place.  This allows them to run in parallel instead of serially (if Labview is as smart at figuring out how to parallelize things as is claimed).
  3. Since I'm not ignoring any rows any more I can replace the Build Array block with a simple auto-indexer on the output of the FOR loop.  This significantly sped things up. 
  4. And finally, I configured the loop for parallel operation.  To do this right click the FOR loop and choose 'Configure loop parallelism' or something like that.  This lets different iterations of the loop run at the same time.  If you think about it it makes sense.  As long as the next loop iteration doesn't depend on the last loop iteration then you can just run them all at the same time.  If I had 48,000 processors then I could have each of them process one line of text and have this job done in microseconds.  When you use it though, pick a small single-digit number for the number of parallel loops. There is overhead associated with managing multiple threads so increasing the number of parallel threads starts to backfire after a while.
The second block diagram is much better programming than the first simply because it's quicker.  It's like I always say: any automated process is better than any manual one because it's consistent AND because once it works you can speed it up later.  There's no way this program would have been useful to me if it took an hour to load one log file.  So don't overcomplicate things!

Friday, March 2, 2012

Boo Labview!

So I'm not posting about the microcontroller stuff I swore I'd start last time. Well, this time I swear there's half a post sitting on Blogger ready to be finished someday. That one IS about microcontrollers. This one's about Labview - mostly because I had a fun little time with it today.

We have a customer who has designed some rather interesting test equipment utilizing Labview Real-Time. This means that for months I've been working with Labview for all sorts of fun things. Since their test equipment is all National Instruments-based we've decided to double down and use TestStand to automate the testing that utilizes their equipment. This means that one of the fun things I invented to help our workflow along was a piece of Labview that reads in a CSV file containing descriptions of what inputs to apply to various VIs and what outputs to expect from them and turns it into a TestStand-compatible sequence. This has saved us more time than I can imagine - mostly because we didn't even try to write any of the sequences by hand. I was working with someone who works for our customer and he actually attempted to write a 500 steps sequence by hand. I don't think he got even halfway through. Keep in mind this is a rather tedious set of steps. Picture something along the lines of "Press '1' on the computer keypad. Verify that '1' appears on the computer display. Record the result. Now press '2' on the computer keypad. Verify purple monkey dishwasher. Hit self in head with hammer. Melons."

You see? I couldn't even get through two steps without losing my mind and desiring to physically harm myself. When you use a CSV file you can search and replace. And copy and paste. Copy step 1, paste to step 2, search and replace 1 with 2. Presto! Extra nerd cred if you use regular expressions! Don't hire a person to do a computer's job. Use the right tools!

In the spirit of using the right tools I wrote a lot of Labview today. You see, our customer gave their test equipment an option to log a LOT of CAN bus traffic. The only problem is that due to the (fast) way they write the log you can't be certain that everything is in order time-wise. I don't blame them - it's a real-time system. You do what's fast and leave the post-processing for a bigger computer (though do note an irony - I'm fairly certain that the real-time PC is more powerful than the host PC it's 'slaved' to). They suggested a rather cumbersome process to import the data into Excel and sort it but I say nay sir, nay! I hate things with more than one step. That's why I have an iPad - only one button. There's no question about what to do. The iPad is off? Step 1: push the button. Success! You'd be surprised how often it works.

I wasn't happy about eight steps and I knew that in the future we would want to use this log to verify behavior of the system under test, so I decided to write Labview to read the log, organize it, search for certain messages, judge skew, rate, etc. So it made sense to write a Labview program to parse the log, organize it, search it and collect statistics. Now keep in mind that this log is distinctly not... how shall I say? It is distinctly not brief. The log for a 10 minute test is 20 megs. And it's all text. No pictures of cats whatsoever (not that they wouldn't improve things). Admit it - you're a little surprised that there can be that much text and that I can be interested in so much of it. But hey, we're not barbarians. We can handle this much text. It's no problem for a computer - especially the amazing computers we have today! Computers are information processing MACHINES. And a machine means I don't have to do the work for myself. That means less thinking. That are good.

And all it takes is....

Um...

Theoretically it will take around six hours.

Yeah, yeah, I know - that sucks. That's terrible. I should feel ashamed. *I* wrote something that take six hours to process 20 megs of TEXT? 20 megs of video is, what five minutes maybe? Naturally this depends on quality, but let's face it - you'll blow through 20 megs of high-definition video in much less than an HOUR let alone SIX. And I wrote a program that couldn't handle 20 megs of text in less than 75% of my work day.

Hey, I like getting paid as much as the next guy but even I get bored. Plus, I want to get this log analyzing done quickly so I can do more of it and be DONE with everything quicker. I want to respond to our customers questions in minutes when they expect the answer tomorrow. I like to impress, I like to excel and what I've done here is not it at all.

So I start profiling. So I load up a 5 meg text file (one of four in a typical log set). How long does that take? Milliseconds. Single-digits. Not worth mentioning. That is not the problem. I sort the data by timestamp - sounds intensive right? I have no idea what Labview's built-in sorting functions are like. Quicksort? Bubble sort? No idea. Could be a lot of room for improvement there. But... nope. Very brief - a small amount of the total time. Rather impressive actually. So I break it down; I limit the number of lines so I only have a thousand. Now the whole process takes a second and a half. I know it's not loading the file, I know it's not sorting the file so it must be loading all of the data line-by-line into a 2D array. One big ugly FOR loop in the middle of my VI that is slowing everything down. And no matter how much I profile, no matter how much I change the loop always seems to take about half a millisecond per line. My first attempt used one string subset block but it was inside a second FOR loop that ran once for each field in the line. Surely a FOR loop inside of another FOR loop significantly slows things down. After all, they keep telling me Labview is inherently parallel. Multi-processor, hyper-threaded blah blah blah. Putting that FOR loop inside the other one forces a serial process. If I used eight string subset blocks at once then it could thread it any way it pleased. That should help shouldn't it? No dice. No change. Ah, look! I'm indexing an array - if I can find a way to remove that from inside the loop.... no help. What if instead of string subset I used one Scan Into String block? That might.... provide no benefit at all.

I suppose if nothing else I could remove this Build Array block and see if I can't just let the FOR loop auto-index the output but I doubt that would....

....

....

Each iteration of the loop now takes 6 microseconds. Down from 500.

This is why you can't trust Labview. This is why Labview Real-Time is an oxymoron. It does not encourage nor make easy optimization. Now, I will be the first to admit that I created - from scratch in one day - a program to process and analyze tens of megs-worth of text data and display it in a user-friendly way. This program will make me more efficient and productive and I have Labview to thank for that. And I do.

But Labview just feels like a toy to me now. When you have ten lines of text data who cares about optimization? When you have a hundred it's a question of how much do you care about thirty seconds of watching a progress bar. If it's a thousand you go and get a coffee and when you come back it's done.

But if it's half a million? You can't just say 'Well some things take a day to run.' That's silly. That's wasteful. Worse - it's unprofessional. Wall Street trading firms have gone so far as to implement their risk analysis algorithms on FPGAs!. It used to take them the better part of day to process their data. Now? Something like fifteen minutes. And guess what? They win. Their competition still takes hours. Now they can act quicker or make better algorithms to work smarter. Better efficiency gives them more options. That's what a professional does. A professional does not shrug his/her shoulders and say 'I dunno. Let's just buy a faster computer.'

So I can't trust Labview. I moved all sorts of blocks around trying to figure out which one was the culprit. They all looked the same. I had no idea what was under the hood. Was it linear complexity or exponential? National Instruments won't let me look at the code. As far as I know they haven't given me the tools to thoroughly analyze their entire architecture. Where's my gprof report? And was their Build Array block seriously doing something as absurd as copying a ten thousand element array each time it wanted to append one more line of text?

If I had done it in C I could have utilized pointers and custom data structures to make a faster program. If I had done it in C I doubt it would have loaded the file in more than ten seconds to BEGIN with. Only in Labview are you encouraged to just plop down blocks without any regard for the cost in time or resources. Even if you cared there's no way to know other than tedious experience who the culprit in your bad algorithm is this time. Does that addition block break down into one assembly command or 100? You will never ever know with Labview. That makes it a toy. That's why you can't trust it.

Edit: I realize that some people (among the many I'm sure who read my blog) will wish to argue with me and say that the problem I've outlined above with respect to Labview is a problem with EVERY computer language. Seasoned programmers know that you can't just call any function willy-nilly and assume it will be fast. I agree with this. It can happen in C - should I call printf in each iteration of a loop, or build a buffer and then print it afterwards? Building the buffer will be much faster than a function call which will then significantly speed up your loop. And this is, of course, not only something you need to keep track of in Labview but in C or Python or any other computer programming language I can think of.

A seasoned programmer also knows that you adapt your approach to each unique problem you encounter. C provides you a nice framework to solve your own problems which includes pointers, function pointers, structs, a myriad of different loops, bit shifting and if none of these are sufficient you can use assembly. Nothing other than rolling your own co-processor with VHDL is faster than assembly. But with Labview you don't have these options. Pointers? Don't exist. Game over. That's your number one tool for optimizing code. For example, the block I had a problem with was build array with a 2D array. Essentially, I have an array in which each element represents a line of text with several fields. The line of text with fields is stored as an array of strings (which, if you want to get into things, are ultimately arrays of characters, null-terminated). That's a lot of arrays but the easiest way to imagine it is that I have an array where each element store several fields of information. What I would do in the innermost loop was parse each line of text to read the fields and shove it into a data structure and then append it to the array with all the data I had already collected.

Well, it seems the Build Array block was to blame. The only way I can explain the terrible slowdown is that somehow Labview stores its arrays like strings - there's one pointer to mark the start of the array and where the array stops there's a special terminator entry. Just like a null-terminated string. So when I wanted to append the latest data to the array, I think it iterated through the whole array until it found the terminator entry and then appended the data there. For small arrays this isn't too bad, but I was getting to 48000 lines per text file (with four text files worth of data). Iterating through that much data every time you want to add another line gets expensive. And it was probably also re-allocating memory for the array each time too! In C I would at least have done the following:

  1. Allocate enough memory to hold all of the file data before I start the loop. I might count all of the lines in each file but that might take too long. It'd be better just to grab then length of each file and estimate what I needed.
  2. Maintain a pointer to the last element of the array rather than have to iterate through the array each time I wanted to add something. It will not use that much memory and it significantly speeds up.
  3. Don't use another loop to process each element in each line - unnecessary loops introduce unnecessary overhead.
Those are the only ones I can think of right now (and #3 is kind of a stretch) but I already know that with those ideas I could make a much more efficient loop than Labview. If I had all that control then I could write something really fast. With Labview I simply don't. I have no idea how any of their blocks work or even how the auto-indexing on a FOR loop works. All I have to work with are the tools they give me and I have no indication how any of them are different.

Many people will say that it's good I'm thinking about Labview in this way - no programming language is perfect, so learn its quirks and work within them. The only problem is that's not how National Instruments wants us to think of Labview. We can't judge it like we judge C because it's not meant to be C. It's supposed to be a magical language that frees us from worrying about the code and cycles and memory and instead focus on our ideas, our algorithms. Labivew, they assure us, will free us from having to think about how to efficiently implement our algorithms. They say anyone can be productive with Labview. I will give them this: almost anyone can use Labview and that is impressive. I've seen Labview created by Computer Scientists, Physicists, Electrical Engineers, Technicians, Chemists - you name it. And most of it works just fine for what they want it to. But it's generally badly written and inefficient. In some ways Labview makes it much easier to be badly written and inefficient - specifically because it essentially works so hard to figure out what you mean instead of what you said. From the viewpoint of someone who's picky about code that's despicable. From the viewpoint of someone who wants computers to make peoples lives better, that's laudable.