Open Source Intelligence OSINT and the Dark Web
The dark web, the part of the deep web which is comprised of a number of darknets (e.g. Tor, Freenet, I2P…etc), provides individuals with an anonymous way to connect to the internet and publish information. Although this anonymous atmosphere is used to facilitate communications for legitimate purposes, it is also exploited for transforming information, services and goods for illegal purposes. Accordingly, Law Enforcement Agencies (LEAs) are interested in open source intelligence OSINT on many darknets, which would allow them to prosecute those involved in terrorist or criminal activities.
Throughout this article, we will look at LEAs’ approaches in OSINT on various darknets and their currently used techniques for de-anonymizing users on the dark web.
What is Open Source Intelligence (OSINT)?
Open source intelligence OSINT represents intelligence derived from public sources. Across the intelligence community, “open” refers to public, or overt, sources, rather than private, or covert, sources. OSINT is not related by any means to public intelligence, or open source software.
OSINT depends on all forms of publicly available sources including:
- Media such as newspapers, radio, television….etc.
- Internet based communities such as social media sites, blogs, forums, video sharing sites…etc.
- Public documents including government official reports such as budgets, press conferences, demographics, contract awards…etc.
- Academic sources including papers, conferences, symposia…etc.
- Observations and reporting e.g. airplane spotters, UFO observers, radio monitors…etc.
- The deep web
OSINT and the Challenges Faced By LEAs On the Dark Web:
LEAs around the world, including all major UK police forces, are currently investigating darknets, primarily Tor and I2P. The existence of other darknets is known and they are also considered to be of investigative interest, but, given their popularity, Tor and I2P (and Tor in particular) are believed to be used by the majority of dark web criminals, and therefore offer LEAs the best opportunities for investigating criminal activities on the dark web.
The main challenge faced by LEAs is the discovery of darknet nodes involved in illegal activities which would be of investigative interest to them. This is particularly challenging as darknet nodes are unreachable via regular search engines and existing dark web search engines are still far from supporting efficient searches. Once, though, a dark web node of interest has been discovered, the next major challenge is to identify the individual(s) involved in the illegal activities. Unlike ordinary websites however, dark web sites do not have an easily identifiable IP address, and the resolution of a Tor website, for example, firstly to an ISP and then to an individual becomes exponentially more difficult because of the complicated nature of data transfer across multiple nodes on Tor. To this end, LEAs have focused on the identification of the geographical location of such individuals. This has been successful on the surface web where social posts are often geo-tagged (ranging from around 1.5 % on Twitter to around 20 % on Instagram and 50 % on Flickr, but on the dark web there are no social media that use geo-tagging and website owners do not normally advertise where they are located. Another method for geo-locating such individuals would be to examine the wording of their posts, biographies, or adverts. This has had limited success to date, as users on the dark web do not tend to give away such information, and any probing for further information by an investigator can end with them being “marked” as police.
The fact that geo-locating individuals engaging in criminal activities on the dark web is relatively problematic means that investigators have no way of initially knowing where a person is located, and invariably interact with a number of globally located criminals, before locating criminals that are domiciled within their own jurisdictions. Even though larger LEAs may be able to sustain a significant number of “false positives” before finding a vendor who is locally located, smaller LEAs may not, due to resourcing and budgetary constraints, and management may question the viability of continuing this type of work. To date, it is when an investigation moves from the digital world to the physical world, that most executive actions occur, e.g. during the delivery phase of an illegal commodity. This typically means that either a Covert Internet Investigator (CII) engages with an online criminal and coerces them to meet in real life, or that a criminal attempts to purchase some form of illegal commodity advertised by an LEA, and the LEA learns of a real shipping address that affords a surveillance opportunity. However, “de-confliction” appears to be a major issue for all LEAs, particularly in regard to dark web investigations as there is no central control mechanism in many countries for ensuring that “blue on blue” incidents do not occur.
Nevertheless, if an LEA was to take the step of advertising illegal items for sale in the hope of attracting criminals to their site, the investigators would have to have expert knowledge of non-generic or official names for certain items. For instance, Semtex has a chemical name that only people familiar with explosives would use, however people will search for the chemical name on darknets. As customers generally appear to search using very specific terminology, investigators would need to carefully “frame” specific items to attract the attention of the criminally minded. Online chatter between buyers and sellers is also commonplace and there is much negotiation on the price of the goods and/or shipping costs. Accordingly, good communication skills are undoubtedly required. Moreover, CIIs would need to ensure that their online presence looks realistic, e.g., by having a network of “friends” interacting with them. The dark web imposes major challenges for LEAs. Most OSINT investigators do not have a strong computer science/programming background and are largely self-taught when it comes to investigating darknets. This is a training issue that needs to be addressed, and LEAs may wish to start considering their recruitment policy for OSINT investigators, particularly with regard to the use of CIIs. To support them in such investigations, several technological solutions are currently being researched and developed, as discussed next.
OSINT Techniques Used On The Dark Web By LEAs:
Discovering, collecting and monitoring information are the most significant processes for OSINT. Several different techniques (i.e., advanced search engine querying and crawling, social media mining and monitoring, restricted content access via cached results…etc.) are applied to surface web for retrieving content of interest from the intelligence perspective. On the other hand, the distinctive nature of the dark web, which requires special technical configuration for accessing it, while it also differentiates the way the network traffic is propagated for the sake of anonymity, dictates that the traditional OSINT techniques should be adapted to the rules, which govern the dark web.
As a result of the restricted nature of the dark web, requiring special software and/or configuration for being accessed, as well as of the volatility of the websites hosted in darknets (i.e. most websites in the dark web are hosted on machines that do not maintain a 24/7 uptime), conventional search engines do not index the content of the dark web. Nevertheless, a small number of search engines for the dark web exists, as well as directory listings with popular dark websites. The most stable and reliable such search tools are provided for Tor. Specifically, the most popular Tor search engine is DuckDuckGo (accessible both via a normal and an onion URL) emphasizing user privacy by avoiding tracking and profiling its users. DuckDuckGo may return results hosted in Tor onion sites, as well as results on the surface web based on partnerships with other search engines, such as Yahoo! and Bing.
On the other hand, Ahmia (accessible both via a normal and an onion URL) is a search engine returning only Tor-related results (after filtering out child pornography sites), and as of April 2016, it started indexing more than 5000 onion Web sites. Furthermore, Torch is also available only through its onion URL for retrieving results from Tor. Finally, several censorship-resistant directory listings, known as hidden wikis (e.g. The Hidden Wiki, Tor Links), containing lists of popular onion URLs are available and provide the user with an entry point to the world of Tor onion sites.
Traffic Analysis and de-Anonymization of Users of the Dark Web:
The anonymity and the communication privacy provided on the dark web constitute the most significant incentive for attracting users wishing to hide their identity not only for avoiding government tracking and corporate profiling, but also for executing criminal activities. It is really important for LEAs to monitor the network traffic related to criminal activities, and to identify the actual perpetrators of these cybercrimes. Therefore LEAs are greatly interested in exploiting techniques which will allow them to determine with high degree of accuracy the identity of dark web users participating in criminal activities.
The de-anonymization of dark web users is accomplished either by exploiting the unique characteristics of every darknet, or based on data gathered through network traffic analysis of the communication taking place within such a darknet. In the former case, the de-anonymization process attempts to take advantage of potential weak points found across a darknet, whereas in the latter, the data collected is cross-referenced so as to identify the anonymous data source. As no existing darknet can guarantee perfect anonymity, several types of “attacks” have been proposed in the literature for de-anonymizing dark web users (especially Tor users) after taking advantage of vulnerabilities either existing inherently within the anonymity networks and the protocols used, or being caused by the user behavior. One of the early research studies shows that information leakage and user de-anonymization in Tor is possible to occur either due to the intrinsic design of the http protocol or due to user behavior (i.e. users not following the Tor community instructions closely). A recent study proved that https is the best countermeasure for preventing de-anonymization for http over Tor.
Furthermore, another work proposes a collection of non-detectable attacks on Tor network based on the throughput of an anonymous data flow. It presents attacks for identifying Tor relays participating in a connection, while it also shows that the relationship between two flows of data can be uncovered by simply observing their throughput. Recent research efforts have also developed attacks for identifying the originator of a content message after taking advantage of Freenet design decisions. The proposed methodology requires deploying a number of monitoring nodes in Freenet for observing messages passing through the nodes. The main objective is to determine all the nodes having seen a specific message sent and identify the originating machine of the message when certain conditions are met. Moreover, a survey paper that discusses well-studied potential attacks on anonymity networks, which may compromise user identities, presents several mechanisms against user anonymity, either application-based, such as plug-ins able to bypass proxy settings, targeted DNS lookups, URI methods, code injection, and software vulnerabilities, or network-based, such as intersection, timing, fingerprinting, or congestion attacks. The effectiveness of these attacks was examined, by also considering the resources they require, and an estimate is provided in each case on whether it is plausible for each attack to be successful against modern anonymous networks with limited success.