I was working on a eBook conversion workflow for a small publishing house last week, as a favour to the owners. I thought I could get away with a couple hours of work: maybe write a few scripts, chain a couple of existing libraries together, and then email them my code. I was dead wrong. I gave up after two days of work.
Words can’t begin to describe my hatred of Word right now. doc is bad; docx is worst. We need to fix this
— Eli James (@ejames_c) March 24, 2012
The problem was with Word. Word’s doc and docx formats are proprietary, clunky to work with, and incredibly hard to convert to ePub and mobi without weird artifacts and edge cases. It doesn’t help that the standard publishing workflow is in Word — many writers, editors, and publishers use Word source files in their daily lives.
The challenges of working with Word are not new. Smashword’s MeatGrinder engine requires authors to tediously format their doc files; other guides warn authors against using Word to ebook conversions. The Outsell-Gilbane report on Publishing Transformation advises publishers to switch to XML-first workflows ‘as soon as possible.’
There are two likely solutions for this:
1) Write a perfect converter from Word to X, where X is any other text-based markup format. This is a technological problem, and is incredibly hard.
2) Get writers to write in non-Word formats. This is a social problem, and is incredibly hard.
The comparison between the two solutions above is, of course, a little unfair. The truth is that the second problem is easier than the first … but only in the sense that nobody has really tried taking a crack at it. There have been many attempts at writing a good Word conversion library, but all attempts have failed for various edge cases. There have not been strong attempts at creating a beautiful writer-focused tool, save perhaps Scrivener. But Scrivener isn’t popular the way Word is – ideally, you’d want something so pervasive writers would be crazy not to use it.
(I could, by the way, be wrong on the first issue – if you know of a good library to use, please hit me up in the comments).
I’m very tempted to take a stab at both problems over the Summer. No promises, but these are huge problems I wish someone would solve. The alternative to a Word-first workflow is a greatly simplified publishing process, one that is accessible to both writers and publishers alike.
Here’s a taste of that alternative world: Matt Neuburg wrote an essay on his book publishing process for O’Reilly Books. It is, admittedly, very technical, and it demands some programming knowledge. But his process is this: he writes chapters in a text-based format; generates HTML for quick previewing (ebook formats are HTML-based, after all) and then, when he’s ready, types a single command to send his source files directly to the O’Reilly server.
Here’s the really cool bit: because he writes all his chapters in a conversion-friendly format, O’Reilly is able to instantly generate a PDF – all properly type-set with fonts and layout as in an actual O’Reilly book. Neuburg then gets a copy of this PDF to preview, walking around his house with the book loaded up on his iPad. If he so wishes, Neuburg may run another one-line command, and all the readers who have subscribed to O’Reilly’s Early Release program for his book gets a copy of the updated book – in PDF, EPUB, or web form (at Safari Books Online). Naturally, his editor is able to plug into this process from the O’Reilly side of things, and every change is backed up in a Subversion repository.
In Neuburg’s own words:
- I’m working in plain text, lightly formatted; so my writing and editing and revising are easy and nimble.
- I’m using TextMate, a text editor that makes my use of lightly formatted text easy.
- I can preview my work as HTML, which makes me a better proofreader.
- I can “chunk” my book into nice-looking HTML chapter files for public consumption, so the rest of the world can watch me work.
- Thanks to the O’Reilly commit hook, I automatically get a PDF version of my work. This is fun and encouraging as the book grows, and makes me an even better proofreader.
- We’re using Subversion, so my editor and I have an easy time communicating changes back and forth to each other.
- Without any trees being killed, readers can purchase an electronic Early Release edition of my book, and they are kept up-to-date as I continue to write and revise.
My point: moving away from Word enables writers and publishers saner publishing workflows. It doesn’t make sense for the writing/editing process to be done in a format separate from the ones used in the publishing process.
Word is a curse on digital publishing workflows. The sooner we move away from it, the better.