The processes and tools behind a true APT campaign: Reconnaissance

This article is part of a series about APT campaigns. The other topics covered in this series are reconnaissance, weaponization and delivery, command and control, and exfiltration.

In part one of a series on understanding the processes and tools behind an APT-based incident, CSO examines the reconnaissance aspect of an attacker's campaign. This is the first step of many, and often helps the attacker identify who to attack and how.

Personal Information: People are your weakest link

All too often, the information that harms an organization or person the most is something that wasn't viewed as important enough to protect to begin with. This can be anything from telephone or email directory listings, metadata within a document passed around online, to an executive's full name and corporate biography.

Some information can be discovered through public records and Web searches, but sometimes that isn't the case. The information about a person or organization that is found publically is called Open Source Intelligence (OSINT), because it is freely and publically available to anyone who knows how to find it. The problem is that for most people, the amount of OSINT available from a single source is usually rather scarce.

Chaining information -- that is, taking smaller bits of data and keeping it together until you have a full profile -- is commonplace for many criminals, because of this scarcity. The hacktivist collective known as Anonymous is legendary for their use of "doxing" to collect personal information on someone, or something, before launching an attack. These "dox" as Anons call them, are nothing more than information chains. Hacktivists and criminals aren't the only ones who do this however; security professionals do the same thing, including law enforcement.

Think about how much of the information available to the public via the Web would allow an attacker to get clearer picture of your organization and its employees. This includes data from business reports, news reports, the organization's website, social media accounts (personal and professional), as well as associative web information from business partners.

Put yourself in the attacker's shoes. And remember that at this point, it doesn't matter if they are working for a nation-state, or for themselves. With the collected information from a wide range of OSINT sources, they'll have a good idea of whom to target and why; and more importantly, they'll know how to approach these people, with little to no additional research or background information needed.

The attacker will have a good indication of the person's hobbies, where they went to school, addresses and other personal information, the types of social groups they belong to, and how they interact with peers. All if this is valuable information. Unfortunately, as previously mentioned, it is also information no one thinks to protect.

Speaking of data no one thinks to protect, let's look at metadata.

Metadata: Hidden keys to your corporate kingdom

In this context, metadata is the embedded information included with documents and images. We're not talking about the metadata that the NSA collects on a minute-to-minute basis. Most people are unaware that many of the pictures they upload to the Web contain not only the location where the image was taken, but also an accurate timestamp, as well as hardware information. When it comes to metadata within documents, from PDF to PowerPoint, and everything between, it is possible to learn software titles and versions, the document author's name, network locations, IP addresses, and more.

Understanding metadata is important, because the first thing an attacker is going to collect during the reconnaissance phase are the publically available documents produced by the target. Documents can easily be harvested and checked for sensitive metadata with a rather handy tool called FOCA (Fingerprinting Organizations with Collected Archives). It can be downloaded here. While attackers will use it for reconnaissance, good guys can use it too, as it's handy when assessing internal risk.

Here's a good example of what metadata can serve up. In 2011, a 1.2 GB Torrent file published by someone representing Anonymous led many to believe the U.S. Chamber of Commerce, the American Legislative Exchange Council (ALEC), and the Michigan-based Mackinac Center for Public Policy, had suffered a data breach. A short time later, it was concluded that the documents were not stolen, but collected using FOCA.

Within the document set belonging to the U.S. Chamber of Commerce, there were 194 Word documents (.doc and .docx), 724 PDF files, 59 PowerPoint files (.ppt and .pptx), and 12 Excel files (.xls and .xlsx).

By examining the metadata within, 293 names were found, a majority of them representing network IDs. In addition 23 unique emails discovered, but given the exposed network naming conventions, working out the others will offer no challenge to an attacker creating a profile as many of the people representing the U.S. Chamber of Commerce are easily discovered via OSINT. The data also included folder paths, both internally on the network, as well as local system paths, and webserver paths. The location and name of shared network printers were also identified.

When it comes to software, the data from the U.S. Chamber of Commerce listed more than 100 unique titles. It's true that many of the identified software titles are based on when the document was created. Yet, given that many organizations sill keep legacy software in production, knowing that there are older versions of Microsoft Office, Adobe Reader, Acrobat Distiller, or Xerox WorkCentre software on the network, is valuable data for an attacker doing reconnaissance.

Also of value is the knowledge of IP addresses, as well as proof that the organization was running Windows XP, Windows Server 2000, and Windows Server 2003 at the time the documents were created. Again, while some of the data is old, the massive amount of information exposed can be used as a starting point when targeting an organization.

While FOCA can help discover metadata, there are plenty of resources available to manage and eliminate it. A solid starting point are the recommendations from Microsoft and Adobe, as well as a technical note from the National Security Agency.

Technical Information: Pwning the infrastructure

While attackers will use OSINT to hunt for prospective marks, they'll also look at applications and scripts used by the target organization's website(s). Attackers will probe a target's entire network for flaws, so applications and scripts are not the only attack surface; they're just the easiest ones to access.

As mentioned, knowing the types of software used by the target is valuable (e.g., Office, Adobe Reader), but so are IP addresses, webserver specs (such as platform versions), webhosting information, and the types of hardware (e.g., routers and servers) used on the network.

Platform version numbers help the attacker identify existing vulnerabilities, but when it comes to hardware, this information can be used to locate default credentials. When it comes to scripts and website development, an attacker will passively scan for logic flaws, Cross-Site Scripting, SQL Injection, and other vulnerabilities.

Another avenue of technical reconnaissance is the supply chain. Many organizations often list their business partners publically, which to an attacker equates to another relationship to exploit. Think: If a reseller's account is compromised, how can that impact your organization?

At this point, it is rather clear that the opening salvo in an APT-based campaign is information gathering. This is a key difference between a targeted attack and the generic attacks that most organizations are subjected to day-to-day. Generic attacks work on volume, so an attacker doesn't care really who clicks a link or opens an attachment.

Sometimes the best summation is a simple checklist. Here's an outline of what an attacker will be looking for when it comes to reconnaissance operations.

OSINT Data (publically posted data from the target's domains)

Downloadable documents

These offer direct information as well as the chance to collect metadata.

Employee images and corporate event images

These offer direct information as well as the chance to collect metadata.

Staff directories and leadership / management profiles

Used to know who is who, and to establish relationships within the company.

Projects and product data

This data can be helpful when researching attack surfaces and background information.

B2B Relationships

This data is used to establish the supply chain relationships and sales channel for later exploitation if needed.

Employee details

This will include developing profiles with personal and public data from social media sources: such as Facebook, LinkedIn, Twitter, and blogs.

Software data

The types of software used within the targeted organization, including OS and third-party software; often collected via metadata

Building a rounded personal profile

A full profile on a person will include - Full name; address (past and present); phone numbers (personal and work); date of birth; Social Security Number; ISP data (IP address, provider); usernames; passwords; public records data (taxes, credit history, legal records); hobbies, favorite eateries, movies, books, and more.

While criminals will attempt to gather all of this data, the amount of profile information needed for a given campaign will be different in each case. However, the more there is, then the more leverage an attacker has when they make contact with someone. No piece of information is too small when building a profile.

Building a rounded technical profile

A technical profile for a targeted organization will include network maps, technical details obtained from metadata, IP addresses, available hardware and software information, operating system details, platform development data, and authentication measures such as how network IDs are created.

With this information, the attacker can use the personal profile data and target the helpdesk. By that token, knowing how IDs are created also helps establish how email addresses are created, making the task of Phishing, guessing addresses, or initial communication easier. When it comes to operating systems, third-party software, and platform data, the attacker can hunt for vulnerabilities or default access.

Web applications are checked for common vulnerabilities including Cross-Site Scripting, SQL Injection, Remote or Local File Inclusion flaws, and logic flaws. Likewise, armed with B2B data, channel apps or partner-based apps are also checked for the same flaws. The idea here would be to exploit the supply chain in order to gain access to the targeted organization.

Data gathering resources:

As these sites are used to collect profile information during the reconnaissance phase, each new bit of information exposed leads to more searching, and more information for an attacker to leverage. Social media profiles lead to names and images.

People, no matter how private they are, leave something of themselves behind on the Internet. Many are unknowingly exposed thanks to public record searches, as well as massive indexes of information available for next to nothing via data brokers.

Attackers know where to look. Depending on the target, some attackers are bankrolled and will pay for information, or information services. Please note however, this is not an exhaustive list of resources, these are just the ones mentioned to CSO during various conversations.

Google (www.google.com)

This includes all of Google's data points. For example, Google Maps, Google Groups, Blogger, YouTube, and Google +. When searching for information, Google should always be the first stop.

People / Business Searches

These sites offer public information searches on people, businesses, and the connections between them. The best results will come from using all of them and creating two information chains. The first will hold all of the common data found on all of the indexes, and the other will be the data that didn't match up. The unmatched data should be checked for authenticity.

Zoom Info (www.zoominfo.com)

PIPL (www.pipl.com)

Intelius (www.intelius.com)

Muckety (www.muckety.com)

Other search resources

Web Archive (www.archive.org)

Also known as the Internet Archive, it can be used to discover older copies of comments, articles, websites, and profiles. This is useful for tracking someone or something over time.

GeoIP (www.geoiptool.com )

This is a basic IP Address mapping website, used if the target's IP is known. It can also be used to confirm location based on other collected data points.

Robtex (www.robtex.com)

Robtex is a useful search engine for mapping DNS data, domain information, and hosted route mapping. Often, this site is used to see what websites share the same IP addresses, or nameservers. This is useful when the attacker wants to compromise a domain hosted on the same server as the target's domain (shared hosting / co-lo environments), which will enable them with indirect access.

KnowEm (www.knowem.com)

A service used to track the various social and media networks associated with a username or company brand. Once the target's social profiles are discovered, personal and business data can be collected, starting with the top three social networks: Facebook, LinkedIn, and Twitter. Images on these networks may also contain metadata; Instagram is an alternative source of metadata. You can also add services such as Foursquare, HootSuite, GitHub, for additional details.

ImageOps (http://imgops.com)

This site hosts a collection of tools for images, including EXIF data extraction, forensics, image search, and more.

SHODAN (www.shodanhq.com)

This search engine lets you find any device that's connected to the Web. Once the device is located, you can then look for services running on it, or a list of related exploits and vulnerabilities. Full features of the site are unlocked rather cheaply, but the free version will do for most situations.

Organizing the collected data

When it comes to organizing all of the various data collected during the reconnaissance phase, the recommended tool is Maltego.

Maltego is an OSINT tool, one that hacktivists, law enforcement and security professionals, and even professional criminals, use to manage information chains. It offers a visual overview of data, and comes in handy when hunting for links between people, groups, organizations, network information (DNS, IP addresses, URLs), and more.

The free version of Maltego works well for most people, but a well-funded attacker wouldn't think twice about purchasing a legit copy under a false identity if it was needed.

Common tools and software

When it comes to the tools that attackers use during the reconnaissance phase, as well as other phases in the attack chain, they tend to be easily obtained and simple to work with. Many of the same tools used by professionals are the same ones favored by criminals, because they get the job done.

SQLMap (http://sqlmap.org)

SQLMap automates the process of detecting and exploiting SQL Injection flaws. It has support for every major database on the market, and many SQL Injection techniques. This is a popular tool for professionals and criminals, as it is easy to use.

BackTrack Linux (http://www.backtrack-linux.org)

BackTrack is a go to tool for professionals and criminals. While some of the tools in this release are too advanced for some attackers, there are plenty of tutorials available online to help them get a solid start.

Professionals love BackTrack because it's easy to use, has a strong community for support and development, and enables access to all of the common penetration tools in one installation. Much of what BackTrack has to offer can be used for all of the phases in an attackers campaign, including reconnaissance, exploitation, and exfiltration.

Metasploit (http://metasploit.org)

Professionals love it, and so do criminals, with good reason. Metasploit is the most known exploitation and penetration testing tool in the world. Like BackTrack, it too can be used for many phases of an attacker's campaign. At this stage, you should familiarize yourself with HD Moore's Law.

Wrapping things up:

Preventing reconnaissance is near-impossible. You can mitigate some of the success a potential attacker has, but the nature of the Internet itself means that information in one form or another will exist, somewhere, and it will be found eventually. So when it comes to mitigation, here are some things to consider.

Monitor logs and analytics apps for unusual spikes in traffic to download materials that fall outside of the normal usage. For example, if downloads to a new sales guide are usually from the U.S., then naturally downloads from places such as Russia, China, Mexico, Taiwan, or India would be suspect, unless they can be reasonably explained. The same can be said for unusual spikes in downloads outside of a normal geographic pattern, such as downloads from California, when the company primarily deals with customers and businesses in Indiana or Ohio.

Never allow internal portals (Intranet), documents, or storage centers to be accessed from outside of the network. Manage access to these resources via restricted IP or corporate VPN, as well as ACL policy. Moreover, a good IAM (Identity and Access Management) process will also act as a solid defense, noted Rik Ferguson, the VP Security Research at Trend Micro, during an interview with CSO.

"Enabling multiple-factor authentication, and managing ageing accounts and passwords effectively, should be standard on sensitive data repositories or servers," he said.

Likewise, monitor ICMP traffic on the network, and familiarize yourself with the ways the protocol can be used for reconnaissance efforts. A good primer for this was published by SANS, and is available here. Further, watch for scans that sweep the network's subnet. This is rare, and rather noisy, but it happens. Probes on seemingly random ports should also be checked.

When it comes to OSINT, another defensive technique is to limit the amount of information that is displayed publically; including phone directories, staff directories, overly specific staff and leadership profiles, project plans, business and channel partnerships, and customer lists.

While such data is viewed as harmless, and often as a key resource for sales initiatives, it allows ties and connections to be made, and offers a wider attack surface. As mentioned previously, filtering metadata is also a key mitigation step, and one that organizations should be in the habit of doing. In both cases however, the limitation of such data will need to be determined by a robust risk assessment, and such an effort should include all areas of the business.

Be mindful of banner grabbing, which is a common and easily used technique that enables someone doing reconnaissance work to learn a good deal about your organization's technical environment.

"A quick telnet to listening ports of servers will very often reveal product versions and patch levels of external facing mail, Web or FTP servers for instance, and allow the attacker to compromise with selected vulnerabilities or known common misconfigurations," Ferguson explained.

Once the attacker has performed reconnaissance, the next step is weaponization and delivery. Part two of this series will examine that aspect, as well as how it can be addressed.

Join the CSO newsletter!

Error: Please check your email address.

Tags security

More about Adobe SystemsAPTCSOEnablingExcelFacebookGoogleHootSuiteLinuxMicrosoftNational Security AgencyNSATrend Micro AustraliaXerox

Show Comments

Featured Whitepapers

Editor's Recommendations

Solution Centres

Stories by Steve Ragan

Latest Videos

More videos

Blog Posts