Wednesday, May 25, 2011

Documentation

People say they hate writing documentation, but what they really hate is Word. And even Word would be okay if no one cared about formatting. Once you have to conform to these corporate styles things get so awkward - oh, you used 11 point font instead of 10, your margins are .05" off, you can't use a table here because it doesn't justify correctly. I've been in peer reviews where the only comments people have are formatting (and spelling errors). It's such an anti-pattern.

Wouldn't it be much better if documentation were like wikis? Where anyone can find the document they want to edit? Where all you need is a web browser to edit it and it's just text? Sure you have to use *'s instead of bullets maybe, or -'s or something, but have you looked lately how many different options you have for bullets in Word? It's insane. I'd rather have one ugly bullet.

So sure wikis are simplistic, but they're straightforward and you get to focus on writing instead of margins. But they don't work for real engineering, right? Real engineering documents are version-controlled, have complicated title pages, fancy diagrams and backgrounds that say things like 'UNCONTROLLED'. Wikis couldn't ever.... or could they?

Step one: version control. Github now has wikis. But Github doesn't just do wikis - anyone can do wikis. Github does version-controlled wikis. Wikis are written in text-based markup: typically Markdown, MediaWiki, etc. But they're all text - just text. Github saves each page you create in the wiki as a text file in a repository separate from the project you're working on. The only non-ideal thing about this whole setup is that all of the wiki pages are stored in one directory - no structure at all. So if you want to create a block diagram for a sub-assembly in a sub-directory you'll have to figure out how to store that information somewhere. I'm considering storing the directory information in the name somehow, but this may be a bit unwieldy.

So you'll end up with a bunch of text files with odd markup stored in a repository separate from your project. Surely there must be a way to take these text files, written with special markup, and turn them into something (dare I say) pretty? Well of course there is - Github takes the text files and creates web pages doesn't it? So yes, it can be done and it will be done. There's an open source program called Pandoc that describes itself as a swiss army knife for transforming markup formats. If you look at the list it can exchange between a lot of formats. Very neat, very useful. Now instead of text files you can get PDFs or... DocBook.

Now with the PDFs you get PDFs that look like nice, printable versions of web pages. Basic but serviceable. But engineering documents from real engineering companies don't just look serviceable - they look complicated. They're full of revision history blocks, referenced documents, government standards and the aforementioned 'UNCONTROLLED' backdrops. You can still do this in this approach but you need a lot more finesse. Enter DocBook. DocBook is used to create... books. You know all of those programming books with different animals on the front? Like this one? If my history is correct, they're all written in DocBook and in fact O'Reilly invented DocBook so they could write their books easier. That's why the all pretty much look the same. That and I guess those folks are boring.

The great thing about DocBook is that it's customizable. The input files are just XML, but the output is usually PDF - just print it off, bind it, draw a fish on the front and you've got a book. Or, if you want an engineering document, you describe some table layouts for revision history, title page, etc, fill out that information in your XML file, transform it and then you've got an engineering document. True, that will be a LOT of work, but so is trying to use Word to do the same thing. Best of all, Git is version control, so your revision history is built-in: you can parse Git commit logs to fill out the revision history section. If your referenced documents are in version control (which would be a good idea) then you can link right to them. And DocBook has all sorts of other neat features built in: automatic table of contents creation, automatic figure referencing with hotlinks, you name it. It's worth looking in to.

Text is great, yes, but we all scream for graphics. The Github wikis can reference documents from your project repository on the Github wiki, so including graphics in the online wiki is not really a problem, but what about on locally-produced PDFs? This might bet hairy. Pandoc has a different format for specifying image links than the Github wiki has. Luckily Pandoc is an open-source project so you can modify it to your heart's content if you so like. I might just figure out something else. So the workflow looks like this:
  1. Draw your tables, graphics, etc in whatever program you use locally.
  2. Use command-line tools (as part of a makefile) to export the local graphics to a GIF or JPG format so they can be included in your documentation.
  3. Save the newly-exported graphics in a common area of your project repository.
  4. Commit your changes to Github.
  5. Write your documentation in a Github wiki and reference the graphics you just committed. This will produce easily-accessible online documentation.
  6. Retrieve the wiki changes from Github to your local wiki repository.
  7. Modify the local copies of the wikis to allow Pandoc to run on them seamlessly.
  8. Run Pandoc on the wiki text files to create either PDF output or DocBook output and copy it to the correct place in your project repository directory structure
  9. If you just want PDFs, you're done. If you created DocBook output then there will be another step to distill the DocBook to PDF after running it through all of your custom stylesheets.
  10. Commit your changes and you're done
Tada! You have professional-looking PDF documentation derived from a wiki and various graphics. And what's great is that most of these steps are automated once you set up the makefiles. The only non-automated steps are actually writing the documentation, making the graphics and creating the stylesheets. Aren't you happy?

No comments: