IanG on Tap

Caller ID for Email

Saturday 28 February, 2004, 04:53 PM

I've just been reading Microsoft's Caller ID for E-Mail Technical Specification. This proposal's goal is to make it easier to filter out spam reliably. It aims to do so by making it possible to discover source address spoofing, i.e., to discover when an email was not in fact sent by the person it purports to have been sent by.

Two questions you might ask about this are: (1) is it possible to detect spoofed emails? (2) Assuming it is possible to detect spoofing, does that make it easier to filter out spam reliably? I think the answers are yes, and no, respectively.

Spoof Detection

Starting with question (1), I think the answer is that if Microsoft's proposed scheme were widely adopted, then yes, it would be possible to detect spoofs. More importantly, it is reasonably practical to adopt: an important feature of Microsoft's proposal is that it doesn't require universal adoption before it starts being useful. Any entity that sends emails can autonomously opt in on a per domain basis by making it possible for others to discover reliably the set of IP addresses that are used for sending email addresses from that domain. And likewise, individual receivers can autonomously choose to verify what they receive. If both sender and receiver have opted in, spoofs can be detected. If either or both have not opted in, we're exactly where we are today.

The specification simply proposes that domain owners advertise their outgoing email server addresses through the existing DNS infrastructure. Today, a domain advertises a list of incoming email server addresses via DNS. Microsoft's proposal simply suggests that domains also advertise a list of outgoing email server addresses. No new email software is required for this aspect of the plan - just some extra DNS records. The proposal uses an existing record type in a new way rather than introducing a new kind of record, which is important, because it means the proposal will work without requiring the existing DNS infrastructure to be modified at all. (Even so, it won't be trivial for everyone to opt in. For example, my DNS provider only lets me advertise a very limited subset of what the DNS system as a whole is capable of today. I would have to change DNS provider to be able to opt into the proposal for my domains. If you read my last blog entry, you won't be surprised to learn that the prospect of changing providers doesn't upset me greatly, but the fact is that depending on how your domain publishes DNS information today, it's possible that implementing this proposal would require more upheaval than you might think.)

On the other end of the connection, entities that receive email can choose to verify incoming emails against these DNS records. (This would most likely be implemented in the servers that accept incoming SMTP connections for email deliveries, i.e. the mail servers that sit on the internet and accept your incoming mail for you. Although it could be implemented by mail user agents like Eudora or Outlook, that's less good, because those aren't the machines that spammers send emails to. In theory, a user agent could look through the routing history that typically appears in an email's headers, but those are not always complete, and are easy to forge. The reliable place to implement this feature is in the servers, because that is where the spam is sent in the first.) When an email is received, the server would look at the source email address's domain, and then look up the list of valid servers for that domain in the DNS. If the machine that sent the email is not in the list, it's a spoof. Of course this does require new software. However, it can be adopted piecemeal - there's no need to try and move the whole world over to this system overnight.

This approach categorizes email into three groups:

Spoofed - the email was received from a machine which is not listed as a valid source for emails from the domain in question, so this is a spoofed email.
Not spoofed - the email was received from a machine which is listed as a valid source for the domain in question, so it is likely to be valid. (This ignores the possibility of attacks on the DNS system and also on network level attacks where IP addresses are spoofed, of course.)
Unknown - this is where we are today. If the domain of the source email has not chosen to opt into this system (i.e. the proposed extra information is not in the DNS) or if the receiving email system has not been modified to check for this information, then we will have no way of knowing whether the email is spoofed or not.

Of course, if someone launches a successful attack on DNS or successfully spoofs IP source addresses, then they will be able successfully to pass off spoofed emails as real. However, this may be less of a problem than it first appears, because the vast majority of spam today comes from unsecure machines attached on ADSL or cable modem connections. Many of the vast number of email viruses that plague the internet today are designed to deploy back doors on end user machines enabling spammers to send bulk email. This lets them steal bandwidth. More importantly it gives them a high degree of anonymity - back in the early days of spamming, persistent offenders used to be obliged to change from one ISP to the next as they kept getting disconnected for sending spam. But by taking over the machines of random end users over the whole internet, it becomes impossible to pull the plug on them.

It is this vast number of machines all over the internet that Microsoft's proposal aims to address. If it becomes impossible to spam from a machine which does not appear to have one of the right IP addresses for the source domain, then all these unsecure machines all over the internet become useless for spoofing email, because while it is possible to spoof IP addresses, it's not usually practical to do so from a typical end user's broadband connection - you typically need to attack the network from a different vantage point.

But IP spoofing notwithstanding, the proposal still relies on the integrity of DNS. I'm not qualified to comment on how secure the DNS is, but you can be certain that if spammers are forced to compromise the DNS in order to spam, that's where they'll turn their attentions next. The only reason it hasn't been a target yet is that spammers have no reason to attack it. But I'm not sure that this proposal does force them down that path. This leads us onto question (2):

Assuming it is possible to detect spoofing, does that make it easier to filter out spam reliably?

Spam in a Spoof-Proof World

Let's assume for a moment that the system is widely adopted by both senders and receivers. Those receiving email will be able to determine reliably whether the email has come from an IP address listed by the owner of the domain as being a valid source for emails from that domain. But does this make the spammer's job any harder?

All it really does is make it harder to send an email that looks like it came from a domain you don't own. But there's nothing stopping spammers buying their own domains. They can legitimately purchase as many domains as they like, and as the owners of these domains, they are at liberty add DNS records that list whatever IPs they like as the official set of outbound email servers. Those IP addresses could be those of the unsecure end user machines that are already being used today to send out spam.

So all this proposal really does is gives me more confidence that the email was sent by someone who owned the domain it claims to have been sent from. But I just had a quick look through my spam folder, and it comes from a very wide variety of source addresses, many of them in peculiar-looking domains, often from country-specific domains where I don't even recognise the country code. For all I know, these emails were sent by someone who owned the originating domain. You can obtain a domain for very little money, especially if you avoid the better known ones like .com, which are comparatively expensive when you look at the global market. If spammers need to buy lots of domains to keep ahead of any domain-based blacklists, that's what they'll do. I keep hearing that spamming is a multi-million dollar business, so a purchasing a few hundred domains every month doesn't break the business model.

So unless you're prepared to whitelist (i.e. only accept email from people you already know) this proposal doesn't seem like it will help all that much. Admittedly, it does improve whitelisting. I quite often get spam spoofed from email addresses I know - some spammers spoof source addresses to be from email addresses in the same domain as yourself, which gives them a good chance of picking the address of someone you know. (Since they already have a large list of email addresses, it's not like they have to guess what other email addresses there are in your domain.) Microsoft's proposal would enable such spoofed emails to be detected, allowing whitelisting to work perfectly. But I don't want to whitelist - I regularly get bona fide email from people I don't know. I don't want to block that email, I just want to block the adverts for all the stuff I don't want.

'Introducing the New Avalon Graphics Model' Up on MSDN

Saturday 28 February, 2004, 05:29 PM

My article on the Avalon Graphics Model just went up on MSDN.

Note to self: next time, send the graphics in as .gif rather than .png, or at least stick to standard web colours in the pictures. My apologies for the speckled appearance of the images.

This is why I don't like extensions in URLs

Saturday 28 February, 2004, 06:36 PM

Kevin Jones illustrates precisely why it is that you really don't want file extensions in your URLs.

Kevin's old RSS feed was JSP-based. The new one is still Java-based, but he's moved over to an MVC approach. Because his URLs reflect these kinds of internal implementation details, it means the URL for his feed changed when the implementation changed - it now uses a .do extension instead of .jsp. As he points out, this leads to versioning problems in the long run. If you make sure your URLs never expose these details in the first place, you can avoid these problems, which is why I went to the great lengths partly described here and here.

If Kevin decides to go down the same route, I hope for his sake that it's easier in Java than it was in ASP.NET!

Since he asks for "any better ideas", and in those previous articles I only described the ASP.NET-specific parts of what I did, I'll briefly outline the more technology-agnostic aspect of how I deal with this here. And rather than just email this to Kevin directly, I thought it might be of broader interest, hence this blog entry.

What I Did

I've written a simple content management system for my site. The content and structure of the site is mainly stored inside a SQL Server 2000 database. The only things on the web server's disk are static content (mostly images and downloadable files), the .aspx templates that present the content in HTML form, and the compiled code for the web site.

There's a table in the database describing the hierarchy of the site. Each item corresponds either to a page, or in certain cases a handler that generates a whole collection of similar pages. (My blog is an example of the latter. Because all of my blog entries are essentially the same page but with different content, there is just one entry in the site hierarchy table for the http://www.interact-sw.co.uk/iangblog/ URL and everything under it. This actually maps to an IHttpHandler implementation. (That's the ASP.NET equivalent of a Servlet, Kev.) That handler then examines the remainder of the URL (e.g. "/2004/01/") and decides which of the various .aspx pages to use (item, day, month, year, or recent, depending on how much of the tail is present) and internally forwards the request onto that page, placing the necessary content in the request scope so that the page can then present that content.)

Each item in the site hierarchy table has a foreign key into the templates table. The templates table contains the internal URL. For simple pages, this will be the path to the relevant .aspx template, e.g. /templates/code.aspx for pages containg nothing but source code. (Note that you can't hit these template URLs from the outside - these URLs are entirely internal to the site. You'll get a 404 if you try. Right now I'm not bothering to generate any content for the 404 though, so what you see will depend on your browser. For Internet Explorer, it shows its normal error page. Mozilla Firebird does something a little more eccentric.) The blog's template is just /BlogHandler, and my web.config file maps that internal URL onto the blog HttpHandler.

This makes it easy to change the implementation without changing the public URL. If for some reason I decided to move over to a .ashx file for my blog handler, I would only have to change the entry in the templates table. That would have no impact on the public URL - that's determined by the site hierarchy table.

April (2018)	(1 item)
August (2014)	(1 item)
July (2014)	(5 items)
April (2014)	(1 item)
March (2014)	(1 item)
January (2014)	(2 items)
November (2013)	(2 items)
July (2013)	(4 items)
April (2013)	(1 item)
February (2013)	(6 items)
September (2011)	(2 items)
November (2010)	(4 items)
September (2010)	(1 item)
August (2010)	(4 items)
July (2010)	(2 items)
September (2009)	(1 item)
June (2009)	(1 item)
April (2009)	(1 item)
November (2008)	(1 item)
October (2008)	(1 item)
September (2008)	(1 item)
July (2008)	(1 item)
June (2008)	(1 item)
May (2008)	(2 items)
April (2008)	(2 items)
March (2008)	(5 items)
January (2008)	(3 items)
December (2007)	(1 item)
November (2007)	(1 item)
October (2007)	(1 item)
September (2007)	(3 items)
August (2007)	(1 item)
July (2007)	(1 item)
June (2007)	(2 items)
May (2007)	(8 items)
April (2007)	(2 items)
March (2007)	(7 items)
February (2007)	(2 items)
January (2007)	(2 items)
November (2006)	(1 item)
October (2006)	(2 items)
September (2006)	(1 item)
June (2006)	(2 items)
May (2006)	(4 items)
April (2006)	(1 item)
March (2006)	(5 items)
January (2006)	(1 item)
December (2005)	(3 items)
November (2005)	(2 items)
October (2005)	(2 items)
September (2005)	(8 items)
August (2005)	(7 items)
June (2005)	(3 items)
May (2005)	(7 items)
April (2005)	(6 items)
March (2005)	(1 item)
February (2005)	(2 items)
January (2005)	(5 items)
December (2004)	(5 items)
November (2004)	(7 items)
October (2004)	(3 items)
September (2004)	(7 items)
August (2004)	(16 items)
July (2004)	(10 items)
June (2004)	(27 items)
May (2004)	(15 items)
April (2004)	(15 items)
March (2004)	(13 items)
February (2004)	(16 items)
January (2004)	(15 items)