URL Dogma

Monday 12 January, 2004, 02:16 PM

Many years ago I read a couple of articles (http://www.w3.org/Provider/Style/URI.html and http://www.useit.com/alertbox/990321.html) about URI design that struck me as making perfect sense. So when I was writing the software for this web site, I had some specific goals for my URIs.

<rant mode='bizarre'>What is it with this popular faux-Latin tendency to pretend that the plural of ‘URI’ is ‘URI’? If the singular were ‘URUS’ then you could make a case (albeit a weak one) for ‘URI’ as the plural. But it’s not. More importantly, if ‘URIs’ is good enough for ftp://ftp.isi.edu/in-notes/rfc1630.txt and for Sir Tim Berners-Lee, it’s good enough for me!</rant>

I’m sorry, I’ll calm down now.

Specifically I wanted two things:

No extensions (e.g. .aspx) or other implementation artifacts. The fact that I’m using .aspx pages internally is of no real relevance to anything, so it has no business appearing in a web browser’s address bar – I want the name of the page to be the name of the page, no more, no less. I also want the name to be durable, so that whenever I switch over from .aspx pages to whatever technology comes next, I don’t break any links.
‘Hackability,’ as Jakob Nielsen calls it. In other words, the user should be able to munge the URL by hand in the address bar and have it behave like it looks like it should behave.

The permalink URLs for my blog aim to meet these requirements. For example, consider http://www.interact-sw.co.uk/iangblog/2004/01/09/subaquaticlifestyle. There are no clues in there about the implementation technology. This is good, because if I perform an about face and go back to using Java for some reason, it won’t matter. (I’ve known sites that map “.asp” to a JSP handler in order to deal with this kind of thing. But if I wanted to commit techno-aesthetic crimes I’d use Perl.) Moreover, everything in that URL reflects something about the item – the iangblog bit indicates that it’s in my weblog. The next three parts are fairly obvious – the year, month, and day on which the article was written. And the final part accommodates my verbosity – I’d hate to be limited to one article per day. I chose to make this final identifier a brief descriptive string, so that the URL gives at least some clue as to what it points to. (Having seen a few tiny URLs, I realised that I really don’t like completely opaque URLs. I’m much more likely to click on a link if I think I have some vague idea about what I might find there, so I feel that this readability is worth the extra verbosity).

These URLs are also somewhat hackable – if you chop off the subject name, you’ll get everything I wrote on the date specified in the URL. If you leave out the day, you’ll get everything I wrote that month. Leave out the month, you’ll get everything I wrote that year. Leave out the year, and you’re left with the URL that is the homepage for my blog, which happens to show recent entries.

I’m sure it’s not perfect, and doubtless it violates several of the principals in the documents mentioned above that I happen not to find interesting, but I’m happy with it.

It was bizarrely difficult to achieve though. At times, it felt like IIS and ASP.NET were specifically designed to prevent this kind of thing!

The first problem is that by default, ASP.NET doesn’t even get to see an incoming request unless the URL contains one of the ASP.NET extensions. Since it was my goal to avoid extensions entirely, this means IIS doesn’t pass requests for my URLs through to ASP.NET! I had to add a wildcard mapping to the IIS metabase passing everything through to the ASP.NET runtime. (The alternative would be to write an ISAPI filter to mangle the URLs, but there are three reasons not to do that. One is that it means writing unmanaged code, something I prefer to avoid whenever possible. Another is that this then means that ASP.NET itself has a false idea of what the original URL is because it has been rewritten before ASP.NET even sees it. And a third is that I didn’t have the option to install an ISAPI filter because I’m using a shared web hosting service.) Adding this wildcard mapping had the unfortunate side effect of breaking the statistics service on my shared web hosting. This issue is still unresolved, so I’m currently running without stats…

With the wildcard mapping in place, requests now make it into ASP.NET. But of course ASP.NET is also designed to expect everything to be done with extensions. One way of dealing with this would be to write a single IHttpHandler or IHttpHandlerFactory implementation, and put an entry in the web.config file that maps everything onto this. This handler would then handle every incoming request and could decide what to do with it. I chose not to take this approach, because I thought it might cause problems when I wanted to pass the request on to a real handler. For the majority of my URLs, I end up passing control over to one of the built-in handlers such as the .aspx handler, or the static content handler. But I thought that if I had a wildcard handler mapping, any attempt to say “Please handle this as you would have done if the URL had been /blogtemplate.aspx” wouldn’t work, because it would (I presume) just call my handler again! I don’t know if that’s what would really happen because I’ve not tried it, but the whole approach didn’t smell right.

Instead I decided to write a module. ASP.NET modules get to see every single request that goes through the ASP.NET runtime. My module rewrites incoming URLs by calling the HttpContext.RewritePath API. Note that you have to make sure that when you rewrite the URL, you don’t lose the query string. I use this:

// ...targetUrl is whatever we've decided to rewrite the URL as 
if (context.Request.QueryString.Count != 0)
{
    targetUrl += "?" + context.Request.QueryString.ToString();
}
context.RewritePath(targetUrl);

Originally I put this code in the BeginRequest event handler, thinking that it would be best to do the rewrite as early as possible. This turned out to be a mistake, because it breaks ASP.NET’s authentication and authorization. (Or at least it confuses it.) If you want to be able to apply URL authorization to the URL scheme the user is using, it’s vitally important not the rewrite the URL until after the authorization has been done! So I moved the handling to the AuthorizeRequest event – this is raised after authorization has occurred, and just before ASP.NET selects the handler that will handle the request.

But we’re not done yet. The next problem I hit was that when using ASP.NET’s built in Forms authentication, it was putting the rewritten URL in as the redirect for the login page. This is a perfectly reasonable thing for it to do, because by the time it does the redirect, it thinks that the URL is whatever the HttpContext says it is. So it turns out that what you really need to do is rewrite the URL temporarily so that ASP.NET chooses the handler you want, and then put it back how it was before the handler actually runs. So I also have a PreRequestHandlerExecute event handler:

private void OnPreRequestHandlerExecute(object sender, EventArgs e)
{
    HttpContext context = HttpContext.Current;
    string originalUrl = (string) context.Items["OriginalPath"];
    context.RewritePath(originalUrl);
}

My URL rewrite code stores the original path in the context Items before rewriting it. This code puts it back.

This all feels kind of backwards. It would feel much cleaner if there were some way of saying “UseHandlerForThisUrl” without having to rewrite the URL. But at least we’re now more or less where we want to be. I have complete control over the incoming URL scheme. I can map this onto whatever internal URL template I like, but I make sure the rewritten URL is only in place at the point in the ASP.NET pipeline at which ASP.NET chooses a handler. For the rest of the handling of the request, the URL visible through the HttpContext is the one that the user passed in.

So we’re done. Except we’re not.

There’s still one problem: postbacks from web forms don’t work. It took me a while to work out why. It turns out that the System.Web,UI.HtmlControls.HtmlForm class insists on specifying the action attribute for a form. In theory this is a Good Thing – if you’re using a runat=server web form, you want it to post back to the right URL, which is presumably why the HtmlForm ignores the action attribute you specify and generates its own. This would be fine if only it generated the right URL. However, if your URLs don’t have a “.” in them anywhere, it generates the wrong URL.

So if you want to use a WebForm, you have to write your own HtmlForm-derived class that fixes the action attribute. This involves overriding RenderAttributes. Because HtmlForm.RenderAttributes deliberately strips out any action attribute that may be present and replaces it with its own wrong version, you must reimplement RenderAttributes completely, without calling down to the base class. I’m using this:

protected override void RenderAttributes(HtmlTextWriter writer)
{
    writer.WriteAttribute("name", this.Name);
    base.Attributes.Remove("name");
    writer.WriteAttribute("method", this.Method);
    base.Attributes.Remove("method");
    if (base.ID == null)
    {
        writer.WriteAttribute("id", base.ClientID);

    }
    // Mustn't call base class – it will mess things up!
}

This is a bit of a hack, but it does the job.

So, I’m done, but it really felt like I was swimming against the stream. I’m not an ASP.NET expert, so this may well be completely the wrong way of doing things, but I couldn’t find any examples of what I wanted to do. I seem to be alone in caring about the look of my URLs. If anyone has any suggestions for a cleaner way to achieve all of this, I’d love to hear them.

April (2018)	(1 item)
August (2014)	(1 item)
July (2014)	(5 items)
April (2014)	(1 item)
March (2014)	(1 item)
January (2014)	(2 items)
November (2013)	(2 items)
July (2013)	(4 items)
April (2013)	(1 item)
February (2013)	(6 items)
September (2011)	(2 items)
November (2010)	(4 items)
September (2010)	(1 item)
August (2010)	(4 items)
July (2010)	(2 items)
September (2009)	(1 item)
June (2009)	(1 item)
April (2009)	(1 item)
November (2008)	(1 item)
October (2008)	(1 item)
September (2008)	(1 item)
July (2008)	(1 item)
June (2008)	(1 item)
May (2008)	(2 items)
April (2008)	(2 items)
March (2008)	(5 items)
January (2008)	(3 items)
December (2007)	(1 item)
November (2007)	(1 item)
October (2007)	(1 item)
September (2007)	(3 items)
August (2007)	(1 item)
July (2007)	(1 item)
June (2007)	(2 items)
May (2007)	(8 items)
April (2007)	(2 items)
March (2007)	(7 items)
February (2007)	(2 items)
January (2007)	(2 items)
November (2006)	(1 item)
October (2006)	(2 items)
September (2006)	(1 item)
June (2006)	(2 items)
May (2006)	(4 items)
April (2006)	(1 item)
March (2006)	(5 items)
January (2006)	(1 item)
December (2005)	(3 items)
November (2005)	(2 items)
October (2005)	(2 items)
September (2005)	(8 items)
August (2005)	(7 items)
June (2005)	(3 items)
May (2005)	(7 items)
April (2005)	(6 items)
March (2005)	(1 item)
February (2005)	(2 items)
January (2005)	(5 items)
December (2004)	(5 items)
November (2004)	(7 items)
October (2004)	(3 items)
September (2004)	(7 items)
August (2004)	(16 items)
July (2004)	(10 items)
June (2004)	(27 items)
May (2004)	(15 items)
April (2004)	(15 items)
March (2004)	(13 items)
February (2004)	(16 items)
January (2004)	(15 items)

IanG on Tap

Blog Navigation

Writing

Other Sites

URL Dogma