Future-Proofing Your URIs

Technology Agnosticism

A future-proof URI is a URI likely to be usable for a very long time, even if you decide to completely reorganize your Web site or change the technology used to produce static and dynamic pages.

A general strategy to make URIs future proof may be summarized as "don't show the technology." If the past 15 years of the Web teach us anything, it is that new Web technologies come out almost every day and you cannot assume that if your Web site is written in Python today, it will still be the case in 2 years, let alone 10 or 20 years. In addition, the Web will continue for a long time to be a network of different servers running different operating systems, platforms, and languages, and there is no benefit for the purpose of interoperability in making those aspects visible in URIs.

By choosing a technology-agnostic URI, you ensure that you don't have to change the URI as the technology behind it evolves or as other Web developers take over the development of a site. In addition, there is benefit to not telling the world exactly what technology you are using for security reasons (although Web servers often send information about server-side modules installed right within HTTP responses).

Along these lines, here are a few things you should do to build future-proof URIs:

  • Don't include anything in the URI that reveals the programming language or Web platform used to produce your HTTP resource. For example, ban .pl, .php, .jsp, .asp, .aspx, cgi-bin, servlet, .do, and so on.
  • Don't include file extensions that reveal the media type of the requested document, such as .html, unless you also provide a media-type agnostic URI. This recommendation may sound strange, since right from the beginning Web servers have been serving HTML pages with .htm or .html extensions, but thanks to content negotiation, you really don't need to do this, and a single URI enables access to an HTML version of the resource but also to future formats you may want for that resource. Think about the growing use of XHTML: a single URI can serve HTML to browsers that do not support XHTML at all, and well-formed XHTML to those that do support it. It makes sense not to duplicate all of your URIs just because you want to serve these two formats. In the future, new formats will appear and you may want to serve them from the same URI. For example, you may want to serve XHTML 2.0 instead of XHTML 1.1 to browsers that implement it.
  • Don't give a hint as to whether your page is a static resource on your file system or generated dynamically: the way a resource is served may change over time.

Being technology-agnostic often requires a little more work upfront, as some technologies actually encourage bad practices (for example, ASP and JSP encourage visible .asp and .jsp extensions, a situation probably at least partially driven by marketing purposes), but the benefits are likely to be long lasting.

Hierarchies and Collections

As discussed in Chapter 7, "HTTP and URIs," in the book "Professional Web 2.0 Programming" (Wrox, 2006, ISBN: 0-470-08788-9), HTTP URIs can contain hierarchical path information. When defining a URI space, you have the option of leveraging that hierarchy or not. As an example of hierarchical URIs, consider implementing permalinks.

Suppose the main URI for your personal blog is http://example.org/blog/. Although that URI can be permanent, its content is by design meant to change and keep updating with your latest blog postings. In this context, the term permalink is used to denote a permanent link to an individual blog post.

The WordPress blogging software documentation describes several types of permalinks, from ugly to pretty. Perhaps it's in the eye of the beholder to determine which is more aesthetically pleasing:

http://example.org/blog/index.php?year=2006&month=8&day=7&post=123

or:

http://example.org/blog/archives/2006/08/07/Web20-thebook

Both of these URIs make sense to represent access to a specific blog entry published on August 7, 2006. Which is the best one depends on the exact use case. For example, the hierarchical solution becomes impractical if you have dozens of query parameters that identify the resource and if those parameters are not by nature hierarchical. In addition, URIs that are internal and not likely to be ever seen by humans can use query parameters with few drawbacks: nice-looking URIs matter mostly to humans.

In the example above, a publication date is naturally hierarchical, that is organized in collections (a month always belongs to a year, and a day always belongs to a month), and in the case of publications such as articles or blog entries, it is a natural primary way of accessing resources. It also has the benefit that a static version of the site, backed up by actual directories, could be built, while query parameters would make this harder to accomplish without using URI rewriting.

It is important to realize that while inspired by the organization of file systems, a URI hierarchy does not have to be backed by a concrete file system with files and directories. The hierarchy can be purely virtual; for example, it can be backed by a database.

In general, URIs, like file system path names, go from general to specific, and from containing to contained. From this perspective, you may want to choose a sufficiently general root path element in your URI structure. For example, most Flickr URIs starts with the path /photos, which leaves the hierarchy open for paths starting with /videos in the future. On the other hand, a site like del.icio.us leaves less room for expansion, as its structure uses the username as root path element, followed by tag names, for example http://del.icio.us/ebruchez/Web2.0.

In addition to the hierarchy, the pretty URI above uses the notion of a slug, that is, a short name given to the blog entry or article. Using a slug has the potential benefit of giving hints to a search engine, as well as being read-only user-friendly; that is, by looking at a list of URIs, for example in your browser's URL completion bar, you can rapidly identify a particular post. On the other hands, slugs tend to make URIs longer. You can of course implement access to a resource using both a slug and a short identifier and use redirection between the two (redirection is discussed in depth in Chapter 16, "Implementing and Maintaining Your URI Space," in the book Professional Web 2.0 Programming).

Future-Proofing Your URIs

Trailing Slashes and Location Independence

An issue that has generated some debate is the handling of the trailing slash: should you allow for /blog/archives/2006/08/07/ or /blog/archives/2006/08/07, or both? Consider the following rules of thumb:

  • If the last part of your path can itself be a container or a collection (that is contain other sub-resources) then terminate the resource with a /. For example, /blog/archives/2006/ could be the URI that displays a summary for all the months of year 2006, but the path can also be followed by a specific month number, so use a trailing slash.
  • If the last part of your path is a leaf resource, like an individual article or post, which cannot contain sub-resources, then omit the trailing slash. For example: /blog/archives/2006/08/07/Web20-thebook.
  • Avoid using a path with a trailing slash and one without to point to two different resources.

If you opt for this strategy, you can be even more user-friendly by redirecting URIs with missing trailing slashes to URIs with slashes. For example, redirect /blog/archives/2006/08/07 to /blog/archives/2006/08/07/ with a permanent redirect.

The use of a trailing slash to signify the root of a particular collection enables you to get rid of URIs that end with index.html or default.html,and the like: just end your URI with a / instead. There is no need to externally expose the name of index or default pages, as their names too can change.

You must choose whether you use absolute URIs, absolute paths, or relative paths when using URI references (such as hyperlinks, reference to images, and so on) in the documents you serve. Consider the XHTML page served by http://example.org/author/clarke. You want to display an image of the author within the page. You can refer to it with an absolute path from XHTML:

<img src="/author/clarke/portrait.jpg"/>

Using such an absolute path has the drawback that the resource cannot be moved around on a server without also changing all of the paths it uses. For this reason, many HTML authors use relative paths as much as possible, especially when resources can be grouped (as in this case of an information page in XHTML with an accompanying image). But what relative URI do you use? Keep in mind that relative paths are usually resolved by the client using the URI that requested the resource as base URI.

If the location of the image is http://example.org/author/clarke/portrait.jpg, the shortest relative path is:

<img src="clarke/portrait.jpg"/>

This has the drawback that the XHTML page must know at least part of its own location, clarke.

Now if the original page is served by http://example.org/author/clarke/ (note the trailing slash) the shortest relative path becomes:

<img src="portrait.jpg"/>

This is optimal from the perspective of shortness of URI and location independence. On the other hand, it requires that the resource be loaded from a URI with a trailing slash.

On the other hand, if you opt for serving image resources from a separate hierarchy, for example, with /images/clarke.jpg, using absolute paths becomes a good solution again. The bottom line is that there is no single perfect solution and that the final choice is yours!

This article is adapted from Professional Web 2.0 Programming by Eric van der Vlist, Danny Ayers, Erik Bruchez, Joe Fawcett, Alessandro Vernet (Wrox, 2006, ISBN: 0-470-08788-9), from Chapter 16, "Implementing and Maintaining Your URI Space."

Copyright 2007 by WROX. All rights reserved. Reproduced here by permission of the publisher.



Comments

  • There are no comments yet. Be the first to comment!

Leave a Comment
  • Your email address will not be published. All fields are required.

Top White Papers and Webcasts

Most Popular Programming Stories

More for Developers

Latest Developer Headlines

RSS Feeds