Spammers and BGP

Carl Byington
510 Software Group
2008-07-21

Abstract

This paper analyzes the potential usage of BGP announcements with respect to email spam. We demonstrate an algorithm to detect SMTP connections from suspicious short-lived BGP announcements. Contrary to the results published in [1] and popularized in [2], we find no evidence that spammers are currently using short-lived BGP announcements of hijacked address space.

If spammers were using such short-lived BGP announcements, we should be able to detect such suspicious prefixes before the SMTP connection arrives. We could then simply answer such SMTP connections with 452 temporary failure codes, effectively greylisting them.

Introduction

The history of spam has shown that almost any vulnerable network service will be abused by spammers. They have used open relays and web servers running vulnerable scripts, exploited various vulnerabilities in Microsoft desktops to create large farms of zombie machines, and used split-routing to appear to send spam from ip addresses that were supposed to be firewalled from port 25 connections.

Around 2001, spammers discovered an interesting mechanism to hide part of their network connectivity. They obtained connectivity from two different providers where those links needed some properties that were fairly easy to arrange. One link is a high speed link where the provider does not filter outbound packets with source ip addresses belonging to other folks. This is egress filtering, and there are still many providers that don't do such filtering. The other link can be slower, since it will only be used to receive the ACK packets sent in response to the data packets sent thru the high speed link. This combination makes it appear that the spam arrived with the ip address associated with the slower link. This caused problems for AOL and others since their initial approach to port 25 blocking only looked at the remote port on outbound packets. The widely deployed fix is to block all traffic from or to a remote tcp port 25 (for those links where you are blocking access to remote smtp servers).

One of the common spam control counter-measures to many of these abuses is the dns-based blacklist. This is a list of ip addresses to which we refuse SMTP service. Such lists are commonly distributed via the DNS protocol. The basis for the effectivness of such lists is that there is a high correlation between the ip addresses that have sent spam in the past, and the ip addresses that will send spam in the future. If spammers could succeed in breaking that correlation, then such lists of banned ip addresses would lose their effectiveness. You could no longer predict the source ip address of future spam delivery attempts. One way that spammers could achieve this, is to be able to use essentially arbitrary source ip addresses to send their spam, including ip addresses belonging to other organizations.

Ramachandran and Feamster [1] claim evidence for the statement that some spammers are currently using short-lived bogus BGP announcements to send spam from hijacked parts of the IPv4 address space. Such a spammer would use BGP to announce some address space, then send spam from those addresses, and then withdraw the announcement. This would make it difficult for the recipient of such spam to determine who actually sent it.

Given the statement "relatively high cost of performing longest-prefix match queries" in section 6.2 of [1], it seems possible that their conclusions are incorrect. If someone accidently advertised 61/8, and you subsequently receive spam from 61/8 (which is not unusual), that means nothing. The question is, did that spam come from some other more stable prefix inside 61/8, or is the longest prefix match of the connecting SMTP client actually that 61/8 prefix? On one of our systems in a one month period, 0.6% of the SMTP connections were from 61/8, and essentially all of those were spam delivery attempts.

The question is, are spammers actually using such short-lived BGP announcements today, or is this just a hypothetical spam tactic that they could use in the future? To help answer that question, and to either confirm or refute that claim in [1], we wrote code to monitor BGP announcements, classify some of them as suspicious, and log instances of SMTP connections from suspicious prefixes.

Prefix Hijack Alert System [3] is another system that attempts to detect address space hijacking, but it is not correlated with SMTP connections or spam attempts.

Internet Alert Registry [4] is another system that attempts to detect address space hijacking, but it is not correlated with SMTP connections or spam attempts. IAR uses methods detailed in PGBGP [5] to detect suspicious routes. PGBGP is primarily looking for hijacks where the attacker actually wants some specific ip address space, either for a denial of service, or to impersonate the actual owner. Our hypothetical spammer does not care about that - they only care about sending spam anonymously. In particular, PGBGP ignores super-prefix hijacks, but it seems likely that that is the preferred method for our hypothetical spammer. However, the PGBGP paper does provide useful data on the required timescale to filter out most of the normal AS origin changes.

BGP prefix selection

Consider the hypothetical case of a spammer who is connected via a provider that does not filter BGP routing announcements. The spammer then has some options to announce ip address space to be used for sending spam. Note that we only consider cases where the spammer simply wants to anonymously use some ip address space. This is very different from the case where the attacker wants to use some specific address space belonging to another organization in order to impersonate some service provided by that other organization.

They can announce a more specific prefix, for example a /24, inside a larger block. For example, consider 169.232.0.0/16. If the spammer pokes around, they can probably find an unused /24 in there. So they announce 169.232.240.0/24 and then send spam from that block. There are two problems with this scheme. First, the announcement of such a smaller block may be filtered out by many BGP routers, reducing their reachability to their spam targets. Second, they may have made a mistake, and that /24 is actually in use by some UCLA service that will notice their hijack.

They can announce a less specific prefix, for example a /16, covering some individual smaller blocks. For example, they could announce 52.129.0.0/16. The spammer could then avoid the four existing announcements inside that block, and instead spam from 52.129.128.0/17. That gives them 32K ip addresses to work with. The advantage here is that their announcement of a large block won't be filtered out by as many (if any) BGP routers, giving them better reachability to their spam targets. And they know they won't interfere with any existing use of that address space, since there was no previous BGP announcement of that /17 or any subset of it.

Or they can simply announce a prefix that is not assigned to anyone. For example, they could simply start announcing 185.10.0.0/16. This has many of the same advantages as the previous scheme, but some BGP routers may be configured to drop such bogon announcements, again potentially reducing their reachability to their spam targets.

Detecting suspicious prefixes

The raw BGP update stream on the current internet is very noisy with moderately high volume. We are seeing about 10 bgp events (prefix announcements or withdrawals) per minute, with an active population of about 256K prefixes. We see cases of /18 prefixes being periodically announced and withdrawn.

May 16 13:44:34 221.128.192.0/18 path 7397 2828 6453 4755 18231
May 16 13:50:38 221.128.192.0/18 withdrawn
May 16 13:53:03 221.128.192.0/18 path 7397 226 2914 6453 4755 18231
May 16 13:53:31 221.128.192.0/18 path 7397 2828 6453 4755 18231
May 16 13:57:17 221.128.192.0/18 withdrawn
May 16 13:58:41 221.128.192.0/18 path 7397 226 2914 4755 4755 4755 4755 4755 18231
May 16 13:59:11 221.128.192.0/18 path 7397 2828 6453 4755 18231
May 16 14:02:23 221.128.192.0/18 path 7397 226 2914 6453 4755 18231
May 16 14:02:51 221.128.192.0/18 withdrawn
May 16 14:06:43 221.128.192.0/18 path 7397 2828 6453 4755 18231

We see cases where an organization has multiple AS numbers, and they announce the same prefix alternately thru two different origins, so the origin of the prefix is flapping.

May 25 05:02:02 131.51.0.0/16 path 7397 22298 19080 3549 27065 27065 27065 444
May 25 22:38:34 131.51.0.0/16 path 7397 22298 19080 3549 27065 27065 27065 450 450 450 450
May 25 22:39:04 131.51.0.0/16 path 7397 22298 19080 3549 27065 27065 27065 444
May 25 22:45:34 131.51.0.0/16 path 7397 22298 19080 3549 27065 27065 27065 450 450 450 450
May 25 22:46:04 131.51.0.0/16 path 7397 22298 19080 3549 27065 27065 27065 444

We see cases of prefixes being alternately announced by a customer AS, and by the provider AS, so again the origin of the prefix is flapping.

Apr  4 21:37:53 84.23.96.0/19 path 7397 22298 3549 6762 39386 39386 39386 39386 39386 24731
Apr  4 21:42:40 84.23.96.0/19 path 7397 22298 3549 6762 39386 39386 25233 34400 34400 34400
Apr  4 21:44:39 84.23.96.0/19 path 7397 22298 3549 6762 39386 39386 39386 39386 39386 24731
Apr  4 21:45:30 84.23.96.0/19 path 7397 22298 3549 6762 39386 39386 25233 34400 34400 34400

CYMRU shows some samples of such inconsistent BGP origins. We need a mechanism to filter out such noise, while still being able to detect suspicous prefixes injected by our hypothetical spammer. Of course, if we filter out too many bgp events, we might then falsely conclude that there is nothing happening.

BGP updates come in two styles. Suppose we have some prefix that is currently announced. We may see a withdrawal of that announcement, followed by a new announcement, or we may simply see a superseding announcement. Both styles are shown in the previous example of the /18 prefix flapping. In the case of a superseding announcement, if the previous announcement was not suspicious, and the previous and current origin AS are the same, then the superseding announcement will be trusted and therefore cannot be declared suspicious. However, if the prefix is withdrawn and subsequently announced, that subsequent announcement will not be trusted, although it may not be suspicious.

We track the history of the AS adjacency graph, by computing the union of all AS adjacent pairs over all the announced prefixes. For example, 137.169.0.0/16 is currently seen here with an AS path of '7397 22298 19080 3549 6517 14981', so we add (7397,22298) (22298,19080) (19080,3549) (3549,6517) and (6517,14981) as valid adjacent AS pairs. We ignore duplicates in the AS path, since it is common for AS paths to end with a string of identical AS numbers. This is path prepending, and is commonly used to balance inbound bandwidth between multiple links. We also track the history of the origin AS for each announced prefix. Both the origin AS and the AS adjacency pairs are tracked via the following algorithm that runs almost every hour.

For each prefix P and origin AS O, announce(P,O,i) is 1 if prefix P was announced by O i hours ago, and 0 otherwise. For each origin AS pair O1 and O2, adjacent(O1,O2,i) is 1 if there exists some prefix P that was announced with a path which included O1 and O2 as adjacent AS numbers i hours ago, and 0 otherwise. For each announced prefix P and each pair of adjacent AS systems O1 and O2, we compute

origin_count(P,O) = ∑(r**n * announce(P,O,i)) over i from zero to infinity
adjacent_count(O1,O2) = ∑(r**n * adjacent(O1,O2,i)) over i from zero to infinity.

We use r=0.99, which gives a half life of about 69 hours.

An announced prefix is suspicious if it is not trusted, and if the origin_count(prefix,origin) is less than 2.9, or if the AS path contains any adjacent AS pair O1,O2 where adjacent_count(O1,O2) is less than 2.9. Note that if we have a stable prefix P, which is continuously announced from the same origin O and with the same path, then the origin_count(P,O) will converge to 100, and the adjacent_count(a,b) for every adjacent pair of AS numbers on that path will converge to 100. A newly announced stable prefix will be suspicious for the first three hours.

We suppress all detection of suspicious prefixes until the BGP update stream has been observed for 100 hours, to suitably initialize the origin and adjacency counters.

Random unsynchronization

If we updated the origin and adjacency counters exactly every hour, we could run into unintended synchronization with various BGP errors. It seems likely that a variety of errors could cause BGP route flaps to be synchronized to an hourly timer. Therefore, rather than updating every hour, we update every 59 minutes plus or minus a roughly uniform random time in (-5 minutes, +5 minutes). It seems unlikely that any BGP route flapping is synchronized to a 59 minute cycle.

SMTP connections

Suppose we have an SMTP connection from some ip address X. We run the standard longest prefix match routing algorithm to find the announced prefix P that will be used to send reply packets to the SMTP client at X. If that prefix is currently suspicious, we log this connection. If spammers were actually using this method, we could simply answer that SMTP connection with a 452 temporary failure.

However, at this time, we don't know when (if ever) this prefix will be withdrawn. So we record this triple (P,X,t) with the current time t. When the prefix is eventually withdrawn at time w, we then log every X such that (P,X,t) exists with t less than three hours before w. That is a log of an actual SMTP connection from a short lived suspicious prefix.

Implementation

The code at http://hg.five-ten-sg.com/routeflapper assumes that you are already running the BGP routing daemon from the quagga package, with a configuration that includes "debug bgp updates" and "bgp log-neighbor-changes". The routeflapper code assumes that the BGP daemon is only receiving routes from a single BGP neighbor.

The routeflapper code reads the syslog entries generated by both quagga and sendmail. It maintains a shadow copy of the BGP routing table, along with the origin and adjacency counters. For every sendmail connection, it looks up the current best-match BGP prefix for the SMTP client ip address and logs those entries that are suspicious.

Whenever you restart the routeflapper, you need to clear the BGP session to your upstream router. This will force BGP to send the full routing table, which will then be logged by quagga and read by the routeflapper to properly build the shadow copy of the full BGP routing table.

Monitoring results

A.

May 20 21:21:30 195.234.182.0/24 path 7397 2828 5588 5606 34279 31354
May 21 05:09:59 195.234.182.0/24 path 7397 22298 19080 1239 3561 1273 1273 1273 1273 5588 5606 34279 31354
May 21 05:10:59 195.234.182.0/24 path 7397 22298 3491 3549 1239 5588 5606 34279 31354
May 21 05:11:21 195.234.182.0/24 withdrawn
May 21 05:14:29 195.234.182.0/24 path 7397 22298 19080 3549 8708 31354 31354 31354 31354 31354 31354 31354 31354
May 21 05:14:59 195.234.182.0/24 path 7397 2828 3549 8708 31354 31354 31354 31354 31354 31354 31354 31354
May 21 05:20:19 195.234.182.123 smtp connect using 195.234.182.0/24 suspicious adjacency (8708,31354)

The route to 195.234.182.0/24 was stable between May 20 21:21:30 and May 21 05:09:59. That prefix was then withdrawn, and re-announced with a different path including the AS pair (8708, 31354). That AS pair had never been seen in the path for any prefix in the last 100 hours, so that prefix was marked as suspicious. We received an SMTP connection from 195.234.182.123, and the longest prefix match for that address at the time was this suspicious 195.234.182.0/24. At the time, 195.234.182.123 was listed on the XBL, so that mail was not accepted anyway. But this is not a case of a short duration BGP announcement, since 195.234.182.0/24 was continuously announced with essentially the same path until May 21 09:21:30 when the path shifted back to the original (34279, 31354) AS pair.

B.

May 25 20:17:33 217.164.0.0/15   path 7397 2828 6762 8966 5384
May 25 23:25:34 217.164.0.0/16   path 7397 2828 3356 15412 8966 5384
May 26 00:19:34 217.164.128.0/19 path 7397 22298 19080 8966 5384
May 26 00:19:34 217.164.0.0/19   path 7397 22298 19080 8966 5384
May 26 01:08:04 217.164.0.0/17   path 7397 2828 5400 8966 5384
May 26 01:31:30 217.164.150.203  smtp connect using 217.164.128.0/19 suspicious origin 5384
May 26 01:53:32 217.164.4.90     smtp connect using 217.164.0.0/19 suspicious origin 5384

When those smtp connections arrived, that /19 was suspicious, since it had only recently been announced. But this is also not a case of a short duration BGP announcement. That /19 was continuously announced for at least the next nine hours, and there were overlapping /17, /16, and /15 announcements from the same origin. At the time, both 217.164.4.90 and 217.164.150.203 were listed on the XBL, so that mail was not accepted anyway.

C. misc. notes

May 27 11:39:49 78.58.0.0/16 path 7397 226 3356 1299 8764 8764

Email from 78.58.159.50, 78.58.175.232, 78.58.198.240, all on the XBL. This was a simple replacement of two /17 announcements with the enclosing /16.

May 28 03:22:55 80.254.110.0/24 path 7397 226 3356 8342 21479 21479

Email from 80.254.110.75 and 80.254.110.91, both on the XBL. 80.254.110.0/24 was first seen here on May 28 02:41:14, and was still suspicious when those SMTP connections arrived. But it has now been continuously announced for over 60 hours, and AS21479 also announces the enclosing 80.254.96.0/19, which has been continuously seen here.

May 30 03:57:39 91.191.50.0/23 path 7397 22298 19080 3549 6762 5391 35567

Three email attempts from 91.191.51.8 on the XBL. 91.191.50.0/23 was announced for about 15 minutes on May 26th. It next appeared on May 30 01:46:38, and was still suspicious when those three SMTP connections arrived. But it has now been continuously announced for over 20 hours, and AS35567 also announces the enclosing 91.191.0.0/18, which has been continuously seen here.

Jun 3 16:50:44 189.6.4.0/22 path 7397 2828 3549 28573

One email attempt from 189.6.5.237 on the XBL. 189.6.4.0/22 was first seen here on Jun 3 14:08:08. But it has now been continuously announced over four hours, and AS28573 also announces the enclosing 189.6.0.0/18, which has been continuously seen here.

Jun 5 07:31:32 92.112.64.0/18 path 7397 226 3356 6849

One email attempt from 92.112.86.13 on the XBL. But 92.112.64.0/18 has now been continuously announced for 26 hours.

Jun 5 07:38:44 96.224.0.0/17 path 7397 226 2914 701 19262

One email attempt from 96.224.1.153 on the XBL. But 96.224.0.0/17 has now been continuously announced for 26 hours.

Jun 6 07:13:56 78.36.144.0/20 path 7397 226 3356 8342 8997

Three email attempts from 78.36.148.117 on the XBL. 78.36.144.0/20 was first seen here on Jun 6 06:08:57. But AS8997 also announces the enclosing 78.36.0.0/15 which has been stable here since May 29 02:11:55.

Jun 24 18:17:14 190.69.208.0/20 path 7397 226 3356 1239 12956 3816

One email attempt from 190.69.213.70 on the XBL. But 190.69.208.0/20 has been stable here from June 24th thru June 29th.

Jun 25 12:01:38 189.61.32.0/20 path 7397 22298 19080 3549 4230 28573

Three email attempts from 189.61.35.48 on the XBL. But 189.61.32.0/20 has been stable here from June 25th thru June 29th.

Jun 25 12:06:42 189.101.16.0/20 path 7397 22298 19080 3549 4230 28573

One email attempt from 189.101.25.32 on the XBL. But 189.101.16.0/20 has been stable here from June 25th thru June 29th.

Jun 25 16:47:17 201.240.0.0/18 path 7397 2828 3356 12956 6147

One email attempt from 201.240.37.133 on the XBL, and one attempt from 201.240.53.83 on the XBL. This one at least looks more interesting. 201.240.0.0/18 first shows up here at Jun 25 16:39:15 as "aggregated by 6147", and was withdrawn Jun 25 20:43:45. So it was only announced for about 4 hours. Both of those ip addresses are now announced via the enclosing 201.240.0.0/16 with the same AS path, still aggregated by AS6147 which is "Telefonica del Peru S.A.A.", and which has been assigned both /17 blocks in that /16 by LACNIC. So this looks like a case where AS6147 was changing their BGP aggregration, and temporarily announced smaller parts of the enclosing /16.

Jul 1 15:25:49 216.40.32.0/20 path 7397 2828 7018 15290 36031 15348

Two email attempts from 216.40.42.17 which is forward.hostedemail.com which is the Tucows/Netidentity mail forwarding service. One was spam caught by SpamAssassin, and the other was legitimate mail. That prefix was considered suspicious since we had never seen an announcement with a path containing AS15290 adjacent to AS36031.

Jul 1 12: total 271391 inactive 13749 suspicious 37
Jul 1 13: total 271401 inactive 13810 suspicious 41
Jul 1 14: total 271410 inactive 14017 suspicious 30
Jul 1 15: total 271426 inactive 14119 suspicious 317
Jul 1 16: total 271425 inactive 13945 suspicious 287
Jul 1 17: total 271427 inactive 13852 suspicious 249
Jul 1 18: total 271434 inactive 13836 suspicious 20
Jul 1 19: total 271447 inactive 13858 suspicious 35
Jul 1 20: total 271451 inactive 13877 suspicious 29

The previous snippet from the hourly logging of counts of (total, inactive, suspicious) prefixes shows that some routing change happened around that time. Note that three hours later those prefixes were no longer considered suspicious.

Jul 1 15:30:09 168.144.0.0/16 path 7397 2828 701 14166

One email attempt from 168.144.250.190 which is xsmtp19.mail2web.com. That was greylisted by the DCC since the recipient had not recently received email from that sender. Note that this is in the same period as the previous routing burp. That prefix was considered suspicious since we had never seen an announcement with a path containing AS701 adjacent to AS14166.

Jul 2 01:24:58 88.87.80.0/20 path 7397 226 3356 1299 20485 39435

One email attempt from 88.87.91.194 on the XBL/CBL. This appears to be a new announcement of that /20, first seen here Jul 1 23:56:31, but it has been stable since then. Note that 88.87.91.194 was detected by the CBL at about 2008-07-02 14:00 UTC, which is Jul 2 07:00 local time, or only about 7 hours after ertelecom.ru announced this block.

Jul 9 22:53:58 222.182.0.0/15 path 7397 22298 2828 4134

One email attempt from 222.183.51.12 on the XBL. I have lost the details on that /15 announcement, but there is a stable announcement for the enclosing 222.176.0.0/13 with path 7397 226 4134.

Jul 20 05:39:15 77.126.128.0/17 path 7397 226 2914 6762 9116

One email attempt from 77.126.172.115 on the XBL. That /17 was only announced for 10 minutes, but there is a long term announcement for the enclosing 77.126.0.0/16 with path 7397 2828 3257 9116. And there are other prefixes with paths that end in 6762 9116.

With the current filtering, we seem to be averaging one hit that needs investigation every 3 or 4 days. And almost every SMTP connection from such suspicious prefixes has already been listed on the XBL.

Conclusions

Attempts to statistically correlate spam arrival times with short-lived BGP announcements are flawed unless they consider the longest prefix match to determine the actual BGP prefix used to send the spam.

On the systems which have run this code, we see no evidence that spammers are actually using short-lived BGP announcements to send spam. In particular, over a 40 day test period looking at about 350K SMTP connections, we have never seen a case of an SMTP connection from a prefix that had a short-lived (less than 2 hour) announcement, much less the sort of 15 minute announcements claimed in [1] and [2]. Perhaps spammers are using this technique, but if so they are avoiding my mail servers. Note that of those 350K SMTP connections, we rejected as spam the mail offered from 97% of those connections.

Of course, it is possible that we have simply filtered out too many bgp events, that spammers are using this technique, and we are just not seeing their bgp events because those events were filtered out as noise.

Future research

It is difficult to prove that there are no spammers anywhere that are using this technique. So we fall back on the basic scientific method, that we only believe that for which we have positive evidence. It would be nice if someone could use this code (or something similar), and actually demonstrate a case of an SMTP connection from an ip address whose longest prefix match was a short-lived BGP announcement, and where such short-lived BGP announcement was in some sense not legitimate.

References

[1] A. Ramachandran and N. Feamster, Understanding the Network-Level Behavior of Spammers, http://www-static.cc.gatech.edu/~feamster/publications/p396-ramachandran.pdf, Sept. 2006.

[2] Dave Josephsen, Homeless Vikings, http://www.skeptech.org/wp-content/uploads/2007/03/josephsen.pdf, Jan. 2007.

[3] Prefix Hijack Alert System, http://phas.netsec.colostate.edu/

[4] Internet Alert Registry, http://cs.unm.edu/~karlinjf/IAR/index.php

[5] J. Karlin, S. Forrest, J. Rexford, Pretty Good BGP: Improving BGP by Cautiously Adopting Routes, http://www.cs.unm.edu/~treport/tr/06-06/pgbgp3.pdf