One interesting trend of the last year or two is the rising use of data analytics and ANI (Artificial Narrow Intelligence) in solving network engineering problems. Several ideas (and/or solutions) were presented this year at the IETF meeting in Seoul; this post takes a look at one of these. To lay the groundwork, botnets are often controlled through a set of domain names registered just for this purpose. In the same way, domain names are often registered just to provide a base for sending bulk mail (SPAM), phishing attacks, etc. It might be nice for registrars to make some attempt to remove such domains abused for malicious activities, but it’s difficult to know what “normal” activity might look like, or for the registrar to even track the usage of a particular domain to detect malicious activity. One of the papers presented in the Software Defined Network Research Group (SDNRG) addresses this problem directly.
The first problem is actually collecting enough information to analyze in a useful way. DNS servers, even top level domain (TLD) servers collect a huge amount of data—much more than most engineers might suspect. In fact, the DNS system is one of those vast sources of information about people and organizations not many engineers are aware of; there is much you can learn by looking at patterns of DNS queries, even DNS servers that don’t directly serve individual client machines. To give a sense of the amount of information DNS servers throw off, according to the presentation, the .nl DNS servers produce 50Tb+ of information on a regular enough basis that storage and analysis of the data can become a problem. To solve the problem, SIDN labs built ENTRADA.
Essentially, ENTRADA is a custom built streaming data store that accepts PCAP files pulled from SIDN’s TLD servers for the .nl domain, and then pushes the data through a series of map/reduce jobs on a Hadoop cluster to discover domains being used for malicious purposes. They built a custom streaming data store and back end processors—
A map/reduce job is run across the PCAP data on a regular basis, just looking for patterns in the information. What turns up is actually pretty interesting. For instance, this is the usage chart (the query rate/etc.) for two different domain names that show up in the PCACP files, as shown in their slides—
The usage patterns of these two domains look completely different. As it turns out in each of the cases SIDN labs has looked at, the domain on the right is characteristic of a domain being used to source phishing attacks. The domain on the left, however, is what “normal usage” actually looks like. These usage patterns, then, are often able to allow the operator to distinguish between valid and malicious use of a domain name. Domain names that appear to be used for malicious activity can be investigated further by a human, referred to local law enforcement, or simply not renewed when their contract expires. Other things that can be detected through these patters are SPAM injection events and other problems.
ENTRADA is an interesting way to find and squash domain names being used for malicious activity. If you’re interested in seeing how this works, they have more information, and even a demo, on their web site—http://entrada.sidnlabs.nl You can not only learn more about the project there, you can even find information on how to contribute to the code.