Digital File Formats

By | Monday, December 01, 2008 1 comment
"Alright, so what's the deal with these different file formats for digital comics? I just want to read comics -- I don't really care about what file type it is!"

Fortunately for you, I happen to be a bit of a computer geek who also deals with varying art file formats in my day job, so I'm here to explain things for everyone!

First, let's look at the two basic models for content delivery: downloadable versus reading it online exclusively. The two approaches come mainly from differing business models. With a downloadable version of the file, you're able to hold on to the file and keep it on any drive you might choose, but reading it online means that you have to look at the file via a live internet connection. The original concern for users here was that, in having to read it online, they had to essentially download a new copy every time they wanted to read it, and that could be time-consuming on slower internet connections. And you weren't exactly guaranteed to be able to get to the internet in the first place, depending on where you were. But download speeds have improved markedly over the past few years, and connectivity is almost ubiquitous. The two issues that remain outstanding in favor of downloads (from a user perspective) are 1) that users are not dependent on the whims of the content publisher to continue sharing the files, and 2) that they are able to centralize/aggregate the issues they want in a manner that makes sense to them.

For example, on my hard drive, I have a directory called "Comics." Within that are a series of folders labeled by publisher. Each publisher folder has a number of folders within it, each labeled with a comic's title. And each title folder has the issue files within that, titled with simply a three-digit issue number (such as 001.cbr or 023.pdf). That makes sense for me. I can quickly and easily find any issue in my digital collection. But that might not make sense to somebody else who, for example, only collects books from one publisher. Or is trying to organize their books chronologically. If the user is limited to how the content publisher organizes their selections, they're forced to follow that format -- which might be different from publisher to publisher.

Online read-only comics are typically delivered using Flash, an interactive OS-independent development system currently owned by Adobe. Flash is designed to create open-ended files, so navigation must be built into the final file. (That's why you see so many different treatments in how these types of comics are displayed. Each creator has their own idea on what works best.) These files COULD be developed as completely stand-alone, downloadable files but it would be a bit cumbersome since the navigation would need to be built in each comic file AND because Flash is not very good at optimizing rasterized graphics that are used in comic pages.

"Wait -- what does rasterized mean?"

Digital graphics can be saved in one of two ways: vector or raster. A vector image is one that is saved mathematically. A saved file will say something to the effect of, "Place one circle in the middle of the page, with a radius of 3 inches. Color the interior with 100% cyan, and give the edge a 5 point stroke, colored with 100% black." A raster image is one in which each pixel is defined independently. "The pixel in location 1, 200 is black; the pixel in location 1, 201 is black; the pixel in location 1, 202 is black; the pixel in location 2, 199 is black; the pixel in location 2, 200 is black..." It shouldn't take much to realize that file sizes can vary quite a bit between raster and vector images!

Another benefit of vector images is that, since they're mathematically based, there's no file resolution to worry about. You can infinitely scale the artwork up or down with no image degradation. But when you scale up a raster image, the individual pixels become larger and more noticeable. This is the source of pixelization that you see when you try to zoom in too much on a raster image...
(Enlargement from Sunday's Sinfest.)

Flash does a good job of handling vector images. It's a mathematically oriented program. Web browsers and most paint programs are better suited to reading raster images -- there's no real math to figure out. But where Flash runs into problems is that when you tell it to render a raster image, it has to, in effect, translate the raster image into a mathematical formula before displaying it. That takes more time and resources, and the program simply isn't designed to render that type of image AND compress it to a reasonable file size.

I should point out, too, that vector and raster are NOT file formats in an of themselves. They're both broad categorizations of file formats. If you think of individual file formats (HTM, DOC, BMP, etc.) as languages (English, Spanish, Japanese, etc.) then think of raster and vector files as "Romance and Germanic languages." Broad categories than encompass multiple languages.

Now, looking at downloadable comics, there are few prevalent forms out there and, interestingly, the technology end of the argument is pretty similar.

The main two forms are PostScript and Joint Photographic Experts Group, more commonly known by their abbreviations: PS and JPG. The JPG format you're likely familiar with, as it's often used on the web. It's a raster style file format that's particularly well suited to compressing images. PostScript is the format that your PC uses when it talks to a printer. Like Flash, it converts your file (whatever it's native format) to a specific mathematical language. Also like Flash, it's not particularly conducive to compressing images -- that's why it takes so long to send a photo to your printer, but pages of text run so quickly.

"But, I've never seen downloadable comics in either PS or JPG forms!"

Ah, but you HAVE seen PDFs and CBRs, no doubt!

PDF stands for "Portable Document Format" and was created by Adobe. For all practical purposes, it's the same as a PS file. it's kind of like the difference between English spoken by East and West Coast Americans -- it's the same, except for a handful of words and a slight change of inflection. PDFs work in much the same ways as PS files do. And, more significantly, they have the same limitations. They work really well with vector graphics that are mathematically based, but start to run into issues with raster images. Given that most comics are still drawn on paper with pencil and ink, and are then scanned into a computer, that means PDF is not an ideal format for most comics.

CBR (and close cousin CBZ) is actually something of a non-format. CBRs are actually nothing more than a RAR file and CBZ is just a ZIP. Exactly the same.

"Well, I've heard of ZIP, but isn't that just a compression format?"

Yes, both RAR and ZIP are file compression formats. RAR does a little better job at the actual compression but, for practical purposes, it's the same as a ZIP. (Indeed, many compression/decompression software utilities handle both formats.) All that CBR and CBZ are, are renamed RAR and ZIP files that contain a series of JPG scans of comic book pages. Each page is scanned individually, saved as a JPG, and numbered in sequential order. Programs that read CBR and CBZ files are really nothing more than image display programs that are designed to decompress RAR and ZIP files on the fly, one or two pages at a time. (You can see in the screen shot that, despite having opened "Pirates_Comics_001.cbz", my computer is displaying the image called "PiratesComics01 04.jpg".)

Quick experiment: try changing one of your CBR files to a RAR by altering the three-letter extension. Now open that file in whatever you use to decompress RAR and ZIP files. You'll see each of the pages as a separate JPGs that you can copy wherever you like and edit in your favorite paint program.

Logically, this all means that CBR makes the most sense as a file format for serving most comics. It's got the best compression (meaning the smallest file sizes), is the most versatile, and is very easy to create. It does require a specialized reader, but so do PDFs and Flash-based comics, and the only disadvantage in that respect is that most computers come with Adobe readers pre-installed on them any more.

However, it's rare that logic actually dictates winning technology wars, and Adobe is the 800-pound gorilla in this debate. And you know how much comics love their primates!
Newer Post Older Post Home

1 comments:

BlakeyRat said...

I don't know what kind of technical work you do, but this quote:

"Flash does a good job of handling vector images. It's a mathematically oriented program. Web browsers and most paint programs are better suited to reading raster images -- there's no real math to figure out. But where Flash runs into problems is that when you tell it to render a raster image, it has to, in effect, translate the raster image into a mathematical formula before displaying it."

Almost complete nonsense. To display a bitmap (or "raster image" if you prefer), Flash does the same thing virtually every other application does: it gives the data to the OS, and tells the OS where to draw it on the screen. There's no difference between Flash drawing a PNG or IE drawing a PNG, or Paint drawing a PNG for that matter.

You are right that there's no "math" involved in bitmap art, compared to vector art. With some exceptions (like translucent images), and ignoring the fact that everything the computer ever does ever consists of math.

Yet, it's faster for most image drawing libraries to draw the bitmap version of an image rather than the vector version. Why? Well, vector art is virtually guaranteed to have curved lines (which need anti-aliasing, which is math), virtually all have layering to determine which vector objects overlay other objects (math), and how the foreground objects should clip the background objects (math, or repeated bitmap drawing), and of course the aforementioned translucency is more prevalent in vector art.

In short, vector art virtually always takes more time to render in CPU-time than bitmap art does. Vector formats have the advantages you mention, though: the small filesize, and ability to look good no matter how they are resized.

I think what you might actually be referring to is Flash's annoying habit of over-compressing imported bitmaps. This isn't any kind of technical limitation, it's just a "helpful" feature to lower the download size of your finished SWF file. And you can turn it off when importing, of course.

Also, the general convention is that bitmap-oriented programs are called "Paint" programs (MacPaint, Painter, Paint.NET, etc), and vector-oriented programs are called "Draw" programs (Corel Draw, the "Drawing Toolbar" in Office, etc.)