The Web Is Imperfect

Yes, obviously. But I’m not (just) talking about the reliability of the content on the World Wide Web, or bloggers’ lack of accountability, or anything like that. I’m talking about the actual languages used to render web pages.

In a perfect world, how would you deliver content to people? (And don’t tell me neural implants — let’s stick with existing computer hardware.) You would divide your article/opinion/rant/manifesto into a layer for data and a layer for presentation. The data would consist of the actual words, images and multimedia that you’re trying to communicate. The presentation would consist of specific instructions for how to display the data in various different mediums.

Why separate data from presentation? Because we’re constantly thinking of new ways to repurpose existing content, and it’s impossible to foresee every way that someone might want to use your content. People are reading content on cell phones, syndicating content on other peoples’ websites, feeding content into their iPods and through screen readers. And that’s just today. Who’s to say that in five years, movie theaters won’t be broadcasting web content alongside the previews? You don’t want to spend three months updating all of your old content for a new medium every four years.

But even discounting these more esoteric methods of content delivery, you can’t even count on a standardized web browsing experience on a PC. People are browsing with Windows, OS X, Linux, Unix, PalmOS and BeOS. They’re using Internet Explorer, Mozilla Firefox, Opera, Konqueror, Netscape and Safari. They’re using large resolutions and small resolutions, TrueType fonts and Postscript fonts and OpenType fonts of all sizes and shapes, 16 million color monitors and monochrome Blackberrys. They’re using services like Babel Fish to translate your website into French, Spanish, Croatian and Farsi.

You want your words to be immortal. If your great-grandkids aren’t reading your words a hundred years from now, it shouldn’t be because their robot butlers can’t figure out how to read your primitive web pages.

Let’s also not forget that the most important “readers” of your content aren’t people at all. They’re Google and Technorati and del.icio.us and other websites and technologies. And while it might be easy for a human to look at web page code and be able to tell what the meat of the page is, it’s not quite so easy for a machine. In an ideal world, you want Google to be able to open your file right up and instantly “know” where to find the meat of the page without having to guess whether your website’s copyright notice is pertinent data or not.

So you need to separate data from presentation. The problem is, HTML doesn’t do that.

Most Net-savvy folks already know that HTML is something of a kludge. Tim Berners-Lee cribbed most of it from the existing SGML. It jumbles data and presentation tags all together in one big random heap. Real programmers look at HTML and shudder. The rules are dreadfully inconsistent and none of the web browsers out there interpret them the same way. Add to this the fact that Microsoft and Netscape started adding their own proprietary tags to the mix during the browser wars of the ’90s, and you have a real headache. (Don’t even get me started on JavaScript and ActiveX.)

Cascading Style Sheets (CSS) promised to relieve some of the HTML headache. CSS allows you to assign style rules to almost any element in your HTML — turn all my hyperlinks purple and give them a ridged, dotted, blue border on three sides — thus taking a good bit of the presentation out of the HTML.

CSS2 went one step further by letting you position elements on a page with your style sheets. This was supposed to eliminate the standard way of positioning things on web pages, which basically involves creating lots and lots of nested tables full of transparent GIFs to nudge things into the right place.

The problem? CSS is kind of a mess too. Certain things that were very difficult with the table model are easy in CSS, but the reverse is also true. Run a Google search on “CSS vertical-align” or “CSS footer”, for instance, and you’ll find hundreds of articles from diligent programmers trying to figure out how to create a simple right-hand column. I now program my websites exclusively using the CSS model, but I still need to use clunky workarounds. In order to put a simple footer on every page of your website, for instance, you need to use an updated version of the transparent GIF trick to “prop” the rest of the page up. CSS has no real way to do it natively. Really.

So we’re getting there, albeit very slowly. Now instead of HTML, web programmers are using what’s called XHTML, which is essentially HTML made nice and tidy, HTML as it should have been. We’re using XHTML to present our data and CSS to deliver our presentation.

One can only hope that someday we’ll reach the holy grail. Pure XML files with nothing but data. Pure style sheets in CSS or XSLT or whatever else comes along. Portability. Compartmentalization. Sense.

I predict we’ll get there about ten minutes before the neural implants hit the street.