Tuesday, December 2, 2008

I can haz irony?

Well I started a blog. I always hated blogs but I suppose I can't withstand the Web 2.0 any longer. Also I hear you can make money with these, somehow. I seek to monetize my anger, thus, I have a blog. I think that's the definition of the internet.

One of my goals in this blog (other than time-wasting) is to write about things that interest me. I interest myself mainly for technical thingies and my preferred topic of the day is DocBook.

In theory DocBook is a good idea. The idea is to take all of the content you put in your document and enclose it in XML tags which determine layout and formatting for you. Take Word for example. When you write something in Word the content and the medium are directly intertwined: you can't (as far as I know) get to a mode in Word where you don't have to worry about page breaks and tab spacing. Everything that you see in the Word window will be there on the print page. If you've been playing the document-creation game for some time then you know that Word is a WYSIWYG (What You See Is What You Get) editor. A WYSIWYG editor is a very good thing. I may not be an old-timer but I've worked with computers long enough to remember when there were WYSINAAWYG editors (What You See Is Not At All What You Get). Trust me, they were a pain. With so many surprises every time you tried to print something. Without WYSIWYG editors desktop publishing would not exist.

DocBook takes a different approach. Rather than start off with the end product (ie, how does the page look? What are the margins? What point font do I use? etc) it chooses to focus on the content and meta-information first and formatting second. If you want a book you start off with a tag and then some tags and all the text goes in a tag and so on. There's special tags for cross-references, figures, acronyms, empahsis, etc. Trust me, if you've seen it done in a book somewhere there's probably a tag for it in the DocBook standard. After you've created all of the content you verify it against the standard, then run it through a converter and get some kind of print output - maybe a PDF, HTML, Help file, etc. Since it's XML you could also theoretically create a transformation to OpenOffice Document or Word XML Document (I've seen some rudimentary Word XML transformations but nothing great). THAT file is your WYSIWYG copy, tada! You're done!

Well, you're really not. The chapter titles look like 'Chapter 1 - The Phantom Meanace' where you wanted them to look like '1- The Phantom Meanace', and you wanted a bigger font for the body text, and when you want to emphasize text you'd like bold AND italics.... In short it doesn't look like you want it. Now you've met the meat of DocBook - customization. By default DocBook outputs don't look so... custom. They're bland and boring, and I'm sure the spacing is off by a bit. The good news is that there are options for some of the changes you want like numbering the chapters. They're easy to set - just put the options you want in an XSL file, include it in your conversion, and it'll work! The bad news is that for a lot of the more minor issues you get to write all-new XSL code and figure out just what the property or attribute you're looking to change is called, then maybe find it in the DocBook XSL files, copy and paste to a custom XSL file, change what you want, try again, change, try, change... It is not easy. It is not straightforward. Get your Google and your patience ready if you want to significantly change DocBook's default output.

So why use DocBook? It seems to be a step backwards towards YHNIWYWG (You Have No Idea What You Will Get) and YHNCOWYWG (You Have No Control Over What You Will Get, also known as the Outer Limits editor). In some respects it is undesirable. However, consider that once you have the formatting correct, it will ALWAYS be correct. One knowledgeable person can create the formatting rules and then any number of people can use them flawlessly thereafter. In a large organization with a need for a standard formatting this can be a real timesaver.

DocBook files are text. Plain ol' text. They're also XML. Use your favorite text editor or programming GUI. I like Notepad++ - it will do the tabbing and tag completion for me! If I need, I can create a python script to read the DocBook XML files and parse them for information, or edit them, or do a simple search and replace. I can create a python script to retrieve information from a database and format it into DocBook XML. Text is also version control friendly.

Consider also that creating DocBook is free. Free as in beer, maybe free as in freedom, I haven't done my homework. With a budget of $0 I can create a PDF from raw text in under an hour. The tools I prefer are free, there are books you can buy but there are also many online resources, free. Word, in contrast, is not free. OpenOffice is, but am I the only one who hates its interface? I don't like using it honestly.

DocBook is write once, publish many: once you've written a DocBook XML file you an convert it many different formats without any additional work.

DocBook encourages reuse and portability: you can include other DocBook files to reuse common sections such as titlepages. You can create an XSL transform to transform any XML data into a DocBook format. Got an HTML table you want in DocBook format? You can make a transform to do it in a jiffy. Got a CSV file? Write a python script to generate a table from it, somewhat jiffily. Got data in a database? Python again FTW. How much data do you automatically want to include in your DocBook XML file? Create a batch file or script to retrieve the most recent data from published sources, transform it into DocBook XML, make sure it is included in your document, then transform your DocBook into your final published format.

Ok, so most of that is a pipe dream that will take LOTS of effort to set up, but remember, once you have a system in place and documented, that system stays there. Put the knowledge in the world, put the process in the world and it won't get lost, and you wont' have to remember it yourself.

For more information on how you (and by you I mean a Windows-using engineer) can get started with DocBook visit my DocBook wiki at Steve's DocBook Reference