Monthly Archives: June 2009

The Novelr Guide To eBook Formats

Say you’ve finished a major arc of your online novel. You want to turn aforementioned arc into a download, and perhaps make that available for purchase from the store section of your site. From here on, however, you’re met with two problems: 1) you’ll have to convert your text to an appropriate ebook format; and, 2) which one?

The ebook format fiasco is sometimes called ‘the tower of eBabel’, and for good reason: there are too many of them. But because we deal in digital fiction, and because ebooks are fast becoming viable models of distribution, we need to consider the sticky question of which ebook format, and why. This post attempts to answer that question. (Note that this is quite difficult to answer without looking into the future, simply because it is unclear if there’s ever going to be a victor in the ebook format wars. But I’ll get back to that in a bit.)


E-book formats are no longer created from scratch. In most cases, the ebook maker – regardless of whether it’s a vendor or an open-source project – will decide to adapt and use an existing format, or to have some underlying programming language to make coding the format easier. Today, that language is often XML, or eXtensible Markup Language. Before we talk about the various ebook formats in proper, it’ll be good to talk a little about XML, and why it’s so popular as an underlying language.

The answer to that lies in XML’s name. ‘Markup’ and ‘Language’ are pretty self-explanatory; it tells us that XML is a programming language that consists primarily of markup tags, much like HTML.[1] In fact, an XML document looks pretty much like any HTML page, the only difference being that XML is powerful enough to define and shape other languages [2]. But unlike HTML, XML is extensible. This means that XML allows you to define and create your own tags. For example, if I were an e-book-format creator, I can easily create and define <title> as a tag describing the title of an e-book. <title> doesn’t actually exist in XML. However, because XML is extensible, I can create what is effectively a whole new platform for my e-book format, and it’ll contain <title>, and whatever other tags I see fit to use. All I have to do is to define them, so that my ebook reader will understand which bits are which, and treat those sections accordingly.

You can tell that XML is useful precisely for this flexibility of form and function. The language is now used for many, many things – sometimes even as the foundation for web services to send requests and responses, behind the scenes, server-to-server. And if you take a look now at even the simplest of RSS feeds, you’ll find a language that is defined – and made possible – through XML.

Most of the major ebook formats today are all built upon some foundation of XML. The ePub format, widely tipped to become wide-spread, is built on a strong XML base. The Amazon Kindle format is built on a modified version of the Mobipocket ebook platform, which is in turn built on XHTML (with a dash of javascript/frame support). So is the format used by the new Sony Reader, though that’s known as the Sony BBeB. The conclusion you can take away from this is that sooner or later, XML will become a major part of your workflow regardless of which ebook format ends up as the eventual winner of eBabel. There’s no running away from it. The good news is, however, that XML is a remarkably convertible format. It’s going to be easier and easier to work with as most major software vendors make the jump to XML-based files; case in point: Microsoft Word’s new docx format is built on XML, and it’s not very hard to convert XML to other formats – say, PDFs, or HTML, or an XML-based ebook format of your choice.

The e-book Formats

So let’s get started. The following are the e-book formats in use today, ones that I believe still have a fighting chance of becoming the format of the known universe.

1. Amazon Kindle’s AZW. The Kindle uses Amazon’s proprietary AZW format, but can read unprotected Mobipocket e-books, HTML, Word documents and plain text (.txt) files. You convert to AZW using Amazon’s online Digital Text Platform, and you format your e-book using rudimentary HTML. AZW supports DRM (unfortunately) and is built around the Mobipocket format – though, confusingly, DRM-protected Mobipocket files cannot be read on the Kindle, because they’re not exactly one and the same. Is it worth it? Publishing your work in the AZW format grants you immediate access to the Amazon online store, where a number of online writers have been making a decent sum selling their work … some of which have been regularly hitting the top 10 bestseller lists for Kindle e-books. So … yes, it’s worth it.

2. Sony Reader’s BBeB, which stands for Broadband eBooks, is perplexing: Sony does not offer any tools to convert to the format, making the Sony Reader a closed medium to all but the biggest of publishers. In fact, the only way to publish for the Reader is via RTF or PDF … but XML to PDF conversions aren’t solid, not at the moment, and RTF limits your formatting options (it’s hardly better than a .txt file, to be honest). And there is at least one unofficial converter to BBeB, but Sony’s lack of support for writer releases is discouraging at best. Is it worth it? No.

3. Mobipocket (also known as mobi). The Mobipocket format was originally created by Mobipocket SA, a French company, in 2000, which was then bought over by Amazon in 2005. It’s been around for quite a bit, and it’s probably the only ebook-ish format at the moment that can claim full multi-platform compatibility. It runs on just about everything: the Kindle, the Palm OS, Symbian, Windows, Mac, and on the iPhone (the Stanza reader allows you to read Mobi books, though it was recently bought over by Amazon and is now in a vague sort of flux). It is, however, not very popular, and there doesn’t seem to be a captive audience or a community built around the format. A quick snoop around the official Mobipocket site confirms this. Why? I’m not sure, not at the moment (and I’m still looking for proper mobi-related numbers) – but a surprising amount of traditional publishers offer their ebooks in a mobi format. Is it worth it? This is hard to say. On one hand, the Mobipocket software suite is completely free, and it’s old enough to make conversion and formatting very easy on the writer. But the truth is that it’s not an exciting format to talk about, and this lack of excitement can probably be attributed to a lack of Mobipocket users … even with free software for just about every platform. And if you’re not likely to get serious ebook readers on Mobipocket (and you can’t sell mobi ebooks on Amazon for Kindle, anyway), then I guess it’s not worth it to spend so much time and energy on a format not many people would use in the first place.

4. ePub originally started off as the OEB (Open eBook) initiative. ePub is currently tipped to be the next big ebook format, if only because it’s backed by a loose consortium of publishers, writers, and programmers, who are tied together in the IDPF, or what is known as a ‘stardards and trade organization for the digital publishing industry’. As mentioned earlier in this article, ePub is built on XML, and so the IDPF leaders are currently trying to push it as a distribution standard for e-books. This means a couple of very interesting things. If the ePub people have their way, publishers will no longer have to produce e-books in different formats for different e-book vendors; they publish in just ePub, and demand that everyone else (say, Amazon) convert ePub to their own proprietary format. And it’s really simple to do that, primarily because ePub’s built on a nearly 100% XML base – itself a highly convertible format. Is it worth it? As of late 2008 Sony announced that their reader would now support the ePub format, and publishers (or at least, the ones who have vested interest in a digital book future) have been relatively supportive of ePub over others. If the IDPF people get their way and ePub becomes the industry standard (or even if it becomes just a distribution standard), ePub would well be worth it. I’m fairly optimistic that ePub will win – at the very least, I want it to win – but the road to that future is far from clear-cut: Amazon has yet to announce any plans about ePub compatibility. They’re the one major player who’s yet to come around to ePub, and for what it’s worth – I think that it’s going to take a bit of time, some elbow grease, and a lot of arm wrestling to get them to see things from the publisher’s point of view. But give it time. It should happen … eventually.

5. Adobe’s PDF format is probably the most known amongst the e-book formats I’ve discussed so far[3]. There’s not much to talk about: PDFs are simple, familiar, and easy to use regardless of medium, plus they’ve been around long enough for everyone to know, more or less, what a pdf file looks like. And because the PDF format is so old, it’s not likely that you’ll ever meet anyone with a computer that can’t read the PDF file format. Is it worth it? Hell, yes.

The Format That Wins

I want to make a case here that the primary ebook format we’re going to work with is probably going to be whichever ebook format wins on the iPhone. The Apple developer conference, WWDC, happened not very long ago, and several very interesting things became clear during that conference, most of it worrying news to the rest of the mobile phone industry, but good news for the rest of us. Here’s what Daring Fireball’s John Gruber has to say:

On the whole, there was a palpable sense that the iPhone is a peer to the Mac in Apple’s eyes. This isn’t about counting how many sessions were devoted to each. Nor is it an indication that the Mac as a platform is slowing. Quite the opposite in fact — Apple is selling more Macs than ever, and, knock on wood, there’s a strong consensus amongst developers that Snow Leopard is going to be the best release of Mac OS X yet. It’s simply that for however fast the Mac is growing, the iPhone is growing far faster.

But the two platforms are symbiotically intertwined. The Monday schedule at WWDC is static. In the morning comes the keynote, which the press attends and where all public announcements are made. After lunch, though, there comes what is effectively a second keynote, this time with material aimed squarely at developers. A technical keynote, as compared to the morning’s marketing keynote, if you will. This technical keynote has for as long as I can remember been titled “Mac OS X State of the Union”. This year the title changed to “Core OS State of the Union”.

Hence the symbiosis: Apple now has two full-fledged developer platforms, Mac OS X and iPhone OS, derived from one core system. Neither felt more important than the other this year at WWDC, which is remarkable considering that one of them hadn’t even shipped two years ago.

But look at their vectors — their relative rates of growth — and ponder how much longer until WWDC begins to feel like an iPhone developer conference with a Mac developer track. My answer: next year. In other words, I think it will have taken just three years for the iPhone to supplant the Mac as Apple’s primary platform. By 2011 it will be obvious.

It’s simply a matter of users. During Phil Schiller’s keynote, he showed a graph of the “OS X” user base over time, with steady growth over the first part of this decade followed by a sharp jump from 25 to 75 million over the past two years. This figure was widely mis-cited, however, as showing growth in “Mac OS X” users. It did not. The graph said “OS X”, not “Mac OS X”, and what Apple meant to show were the combined number of users of Mac OS X and iPhone OS. It was a very misleading and poorly-designed chart.

This doesn’t prove anything on its own, but stick with me for a bit. I’ve been seeing several articles arguing the point that AT&T isn’t providing immediate MMS and tethering support due to fear that their network would crash the very instant a million or so iPhone users decide to connect their devices. And I’ve noticed that the iPhone is itself a remarkably tactile platform, one perfect for reading books, and that we’ve already seen a number of apps showing us just that: that reading, and reading on your iPhone, is one hell of a revelatory experience. We’ve also been hearing rumours of an Apple tablet, with all the touchy goodness associated with their current multi-touch technology, and having that released in the not-too-distant-future would mean bringing the tactile interface to a fully-fledged operating system. And that, lastly, all those people connecting to an online network on such a small device will be a community of captive, fanatical users limited by the processing capabilities of their phones, but not by their phone’s features … making the iPhone all at once better than any ebook reader out there (cough the Kindle cough) but also perfect for reading text on the go.

But all of the above are small, fragmented pieces of information, hardly worth talking about, individually. It’s when you look at them from a broader perspective that things begin to become a lot more exciting, particularly from a digital-fiction point-of-view. Allow me to pull it all together for you: Apple sees the iPhone as a peer to their traditional Mac platform; the iPhone is a superior tactile device perfect for on-screen reading; the iPhone has a fanatical userbase that is connected to the Internet, one that downloads and consumes content through the iPhone itself; and Apple is a master at enabling 3rd-party (software) innovation. Put two and two together and you’d realize that this platform is ready for just the right ebook app[4] to come along, and whichever one it is – be it Amazon’s Kindle app, or an Eucalyptus-type reader, or even one that we’ve never heard about – whichever one that is, that app will be the turning point that defines our industry. Want to know which format you should end up supporting? Watch the iPhone, and watch it closely.

1. HTML isn’t really a programming language, but XML resembles it in the sense that both have very simple opening and closing tags as a foundation, like, say: <head></head> or <blockquote></blockquote> ↩

2. Don’t worry too much about how XML works with other languages – that bit’s not relevent to this article ↩

3. Though I must note here that the PDF is really more of a document format, not an ebook one. ↩

4. This is dependent on one more factor: the app must have seamless integration with an online store, which in turn must be stocked with a good collection of ebook titles. In this aspect, at least, Amazon seems to have a clear lead, but no more so than if Apple decides to enter the ebook market themselves. If they do, or if some publishers decide to take things into their own hands and cobble together an online store/app combination, then I’m willing to bet that things will get very interesting, very fast. ↩

  •    The Monomyth is a story formula that is – apparently – found in too many narratives from around the world. The Wikipedia page on the monomyth is a good laugh:
    In a monomyth, the hero begins in the ordinary world, and receives a call to enter an unknown world of strange powers and events. The hero who accepts the call to enter this strange world must face tasks and trials, either alone or with assistance. In the most intense versions of the narrative, the hero must survive a severe challenge, often with help. If the hero survives, the hero may achieve a great gift or “boon.” The hero must then decide whether to return to the ordinary world with this boon. If the hero does decide to return, he or she often faces challenges on the return journey. If the hero returns successfully, the boon or gift may be used to improve the world.
    It’s also why Harry Potter, Star Wars, The Matrix and Star Trek are so similar. And don’t get me started on the Inheritance trilogy, whose 15-year-old author ripped off just about every famous monomyth there ever was. (via) #
  •    I’ve apparently forgotten about this: Incarnations of Burned Children is a David Foster Wallace short story that Esquire magazine published not too long ago (it was originally from Oblivion). The story itself is one paragraph long (!!), but it’s horrifying: I still have an image from the end burned into my mind’s eye. Treat this one with caution. #
  •    Stay Ahead of the Shift: What Publishers Can Do to Flourish in a Community-Centric Web World is what Mike Shatzkin has to say about the book future to a group of publishers, at BookExpo America.
    We are all in the content business, and we are going to have to move into the context business. The ownership in the future of eyeballs will be more important than the ownership of IP, because value moves to scarcity. This is immutable, you cannot change this. Content creation and distribution are no longer scarce. Anybody can do them. Distribution is not an issue. I can type something on my computer today, I can flip it to my website, it is distributed. Any body in the world, on the web, can get it. The problem is, will they know about it? That’s the problem. Marketing is the problem. Distribution is no longer the problem. And you’re going to do your marketing niche by niche, and nugget by nugget, and it does require scale. If you don’t have enough content, or clout in a community, you won’t be heard. If you don’t pay enough attention or put enough labor into a community, you won’t be able to command the attention of that community.
    I don’t agree with everything he has to say – in particular, I think his view of the cloud being the future to be overstated (he argues that we’re no longer going to download – or own – products, we’re going to buy access to them, ‘they’ being stored online) … but he makes rather good points about what publishers need to do to thrive in a completely different environment. The underlying shtick of his talk is that publishers will first have to build audiences, and then sell them products. And to keep doing the first until the latter makes financial sense. #
  •    Mr. Penumbra’s Twenty-Four-Hour Book Store is a story by Robin Sloan, about a man working in a strange library. The story has a few things to say about how computers and books co-exist side by side (and it dares to imagine one such future), even if the ending’s a little strange. I can’t paste anything here without giving the story away, though my favourite paragraph, at least, should be safe:
    THAT NIGHT, AT THE BOOK STORE, I started working on the new visualization, thinking I could impress Kat with a prototype. I am really into the kind of girl you can impress with a prototype.
    The Twenty-Four-Hour Book Store is also available in Kindle and PRC versions. #
  •    There are, apparently, seven kinds of bookstore customer. I’m a Browser:
    Not to be confused with Grazers, who are content to look at anything so long as it’s a vaguely booklike object set up on the most prominent, most convenient displays, the Browser has an interest, or a Goal: They want a gardening book, but don’t need that specific gardening book. They’re mystery fans, but are happy with any decent read, they don’t need the latest book from the one author who writes the cat mysteries, “oh, you know the books I’m talking about, they’re so good and I can’t wait for the new one, surely you know the author I’m talking about” (sadly, ”˜cat mysteries’ isn’t specific enough — and I hate you.)
    There are also subtypes, like the Saw it in the New York Times subclass of Idiot. #