June
27th, 2008
Who’s idea was it to remove file extensions from URLs?
I’ve been thinking this question for some time now and yesterday at work it came up in a conversation between me and a friend. The web has been inundated with websites that have meaningless URLs. Once upon a time a URL meant something. They had value and told the visitor what they were getting. Sometimes it was a Word document (.doc), an animated GIF (.gif) or generally an HTML file (.html).
WTF happened?
Somewhere between 1990 and 2008 we lost the file extension. By that I mean sites which have URLs which end in .php, .aspx, .pl and a slew of other cryptic and meaningless extensions. When viewing the source of a file that ends in .php you should expect to see PHP code. Same goes for when you try to save it to your computer or when your computer caches a copy. And no, the mime type is not a suitable replacement. The tools used to generate an HTML document become irrelevant once the file ends up in someone’s browser. Once we agree upon that then we can move on to deciding what the right extension is.
Who decides the right extension?
Common sense, that’s who. While this applies to all file types the most common one is HTML. We’re not to the point yet where the names of images, flash and PDF files are not represented visibly by their extension. HTML files fell victim a while back. It’s understandable how it happened but we’re in a spot where we can fix that. There are handfuls of tricks which can be easily implemented to expose HTML content with a .html extension. You could use mod_rewrite, other apache directives like Files or just force Apache send the request through your favorite parsing engine. Not sure about those of you using IIS…contact your nearest MicroCenter or something.
Why’s this so important?
You maybe wondering what’s the big deal? It’s semantic and so is the web. It’s also all about the content. The web has become a wealth of information and the least we can do is to not bastardize one of its fundamental building blocks. Imagine what a mess it would be if PDF files were linked to with a .cgi extension. How would anyone know not to click on them? You can begin to see what’s at stake here.
Doing my part
I’ll be the first to admit that I’m no different than anyone else. Up until a few hours ago I did not use a single file extension for html files on this site. However, I figured that I better do that before blogging about it. I’m still working on converting blog pages to have a file extension and properly do the redirects in order to stay on Google’s good side. If anyone knows of a Wordpress plugin which does that then let me know. For those of you not seeing the .html file extensions, I have aggressive cache headers …please refrain from informing me via comments.
What do you think? Am I crazy or do we need to do this? Let’s add some HTML back into the web.
User contributed links.

June 27th, 2008 at 1:49 am
A URL does not always represent a specific file type, but rather a resource which can be represented in multiple types:
“A resource is a conceptual entity identified by a URI (RFC 2396). An HTTP server like Apache provides access to representations of the resource(s) within its namespace, with each representation in the form of a sequence of bytes with a defined media type, character set, encoding, etc. Each resource may be associated with zero, one, or more than one representation at any given time. If multiple representations are available, the resource is referred to as negotiable and each of its representations is termed a variant. The ways in which the variants for a negotiable resource vary are called the dimensions of negotiation.” - http://httpd.apache.org/docs/1.3/content-negotiation.html
If we later changed the default representation or added a different representation for matching the user’s accept headers, then you’re now serving a file that has a different ‘extension’ to the URL you’re requesting. You say its all about the content, but what you’re suggesting is that we tie the content and the representation of that content together.
June 27th, 2008 at 7:42 am
@Mark
Good point but I don’t see the harm in assuming the most relevant representation. The default representation can be changed at any time which may or may not render the file extension as an accurate representation for the content. Are you suggesting to completely sever the two?
I don’t think we would ever reach 100% compliance with this but it could generally be a benefit for the majority of cases. There will always be edge cases…but I don’t think that should necessarily keep us from making more sense of the web.
June 27th, 2008 at 9:22 am
You’re certainly correct in saying extension have been abused. But I don’t understand your proposal.
There’s a difference between URI’s and URL’s. URI are more flexible, and preferable. URL’s are tied to a specific representation of the data at a given URI.
Why aren’t mime types a suitable alternative?
June 27th, 2008 at 9:55 am
While we are on the topic of URLs, here is one my pet complains about URL - Case Sensitivity in URLs
June 27th, 2008 at 10:44 am
@Benedict,
The proposal is simply to raise awareness of how/when/why using more descriptive file extensions provide an overall benefit. The reason I said mime types are not suitable is, for example, the web browser does not honor that when you try to save a web page. It does not (and probably should not) try to mediate the proper file extension.
Basically, it’s confusing to a user when they save a web page to their computer then later try to open it and are asked what program to use.
June 27th, 2008 at 10:53 am
@Binny,
I concur. I’m a fan of case sensitivity but it rarely (if ever) benefits the end user.
Case sensitivity can be a tricky problem. As you stated it is a byproduct of the underlying OS. However, it’s also a byproduct of having a hard mapping between URL and file system. This isn’t bad in itself but it’s become so easy to work around what I consider a limitation.
June 27th, 2008 at 1:31 pm
You’re correct about .php, .aspx, etc. being poor file extensions on URLs, but not having any file extension is just fine as long as the mime-type is set properly. Why should the URL convey metadata about the file type when there’s a HTTP header for that?