Skimming the surface
The “Web” aka Internet can be a dangerous place. In the modern world full of cybercrimes, cybersecurity plays an important role to protect computers, networks, and software. Data extraction is a powerful tool enabling us to stay up-to-date with market developments, gain market intelligence, and become competitive in the industry.
Cybercriminals have been misusing the web(Internet) as a tool to spread and deploy cyber attacks which is why the security of the internet, as well as organizations, is important.
It is important for organizations to educate their employees on the cybercriminal activities that take place on different layers of the web. Employees should be educated of the potential risks involved concerning both sections of the web.
Every tech-savvy must have come across the words surface web and dark web while exploring the internet. Average users usually use the surface web whereas cybercriminals use the deep web and dark web to conduct malicious activities and cybercrimes. Read on to find out different layers of the internet and how they can be accessed.
What is the Surface Web?
Since the first browser was first invented in 1990 by Tim Berner-Lee, the Surface Web has been part of the World Wide Web. It is the part of the web you will be most familiar with, as it’s anything that can be discovered through your internet browser using any of the main search engines (Google, Bing, Yahoo, etc.). This might include news, buy something on Amazon, or visit any of your usual daily websites – social media sites, and is also the area of the web that is under constant surveillance by the government.
According to a study search engines have indexed at least 5.53 billion pages. These pages make only 4% of the whole web. Also known as Visible Web, Indexed web, Light net, and Clearnet, and includes a fraction of the entire internet with approximately 19 Terabytes of information.
The surface layer consists of various main search engines likes Google, Bing, Yahoo, etc., to discover anything over the internet.
The surface web contains indexed pages that come as a result of typical search engines. This is the web that is used by average users for various reasons, from daily online news to buying something on the e-commerce website.
How to Access Surface Web?
The surface web is indeed accessible to any user on the internet, unlike the deep web and dark web. Any user can access the surface web. Most of us are immersed in the surface web since it is where our daily online activities take place. The reason we can access it easily compared to information on other web layers is that the data available on the surface web is purposely indexed by search engines.
When you type a search query into a search engine it combs through its database. The resultant content is already indexed and stored in its database where the information is organized in the most accurate way possible for easy retrieval. To create such a comprehensive database of all available content, search engines use crawlers to browse and collect the data to be indexed and stored.
This works only for the content which is on the surface web and is indexed.
These search engines primarily access the surface web through web crawlers that make up to 4%, then what happens to the rest of 96%? What about those pages which are not open for crawling?
What is the Deep web?
Almost 7,500 terabytes (TB) of discovered data and almost 600 billion discovered documents are believed to be a part of the deep web on the deep web. Deep web is inaccessible to “normal” search engines. The web crawlers cannot ‘see’ these pages, making them invisible to the search engines. To access such pages one should either know their exact address. You need to use a special browser to use the Deep Web, e.g. TOR.
The Deep Web is Web content that is not included as a part of the Surface Web. It has the popularity of the largest expanding category of new information on the Internet. The quantity of the deep web is 1000-2000X more than that of the Surface Web.
It can also be defined as the hidden segment of the internet which is not accessible using conventional search engines, using encryption or other means; so it’s the aggregate of unindexed websites. Deep web covers 95% of the net and we spend a lot of time on deep pages without knowing.
Here are some examples of “deep web” sites:
- Websites accessible with a username and password (email, cloud services, online banking, or paid subscription)
- Video-on-demand services like Netflix, Amazon Prime
- Companies’ internal platforms
- Educational or library websites
- Government related pages or legal documents
- Medical records
Since your accounts contain personal information valuable for criminals, it is recommended to use strong and unique passwords with a hard combination of letters, numbers, and special characters. When you access your personal accounts using public Wi-Fi, there is a high risk involved. It is better to use a VPN (Virtual Private Network) to protect your virtual privacy.
The Deep web is needed to secure information from being “Googled” at the whim of anybody at any time.
A good example is when you have to generate a PIN to access your bank accounts online. You have to use your details to allow you special access to the information that is stored on the Deep Web.
What is the Dark Web?
A part of the Deep Web that is intentionally hidden from normal search engines is called the Dark Web. All its data is encrypted. Special software, configurations, or authorization and often a unique customized protocol are required to access it. It uses masked IP addresses that are accessible only with a specific web browser.
The Internet consists of billions of pages and strangely these billions of web pages are only 10% of the internet that is visible to users. So where does the rest of the 90% internet lie? The answer is “Dark Web”.
The dark web is inaccessible by traditional web search engines. It is the Web content on the Internet that exists only on darknets, overlay networks. Web experts call Dark Web a subset of the deep web or the part of the internet which is not indexed by web search engines.
Leading online portals say the dark web or dark net is actually the world of the internet where mysterious activities and operations take place. It is a place of cyber-underworld dark activities where confidential data is sold.
Without specific configurations and tools, an average user cannot access the information or data present on the dark web. The primary focus here is anonymity and keeping identities secret.
How to Access the Dark Web?
The dark Web technically is also part of the deep Web. It is legal to access the dark Web, and there are various cyber felonious activities and websites which are illegal to visit. Since the dark Web is only accessible through specific software and tools, searching for any material on the dark Web is comparatively more complicated than using regular search engines like Google.
This is mainly because of the dark Web’s lack of index or ranking function to search exactly what the user requires. Only a few search engines like Uncensored Hidden Wiki provide guidance to content search on the dark Web. Searching anything on the dark Web might be risky because the dark Web is full of illegal websites.
The dark Web or the dark net includes sites that are designed to be hidden. TOR websites are not accessible unless specific software programs are used as most of the data is encrypted and hosted mostly anonymously.
Apart from special software programs, dark pages can be accessed only with the help of certain anonymized browsers like TOR. TOR encrypts every piece of content or action when accessing dark pages making the user entirely anonymous and tracking almost impossible.
The dark net is also a secret communication channel for journalists, human rights activists, or political activities. It is also widely used by governmental entities and Military services to store intelligence reports, political records, and anonymously exchange other sensitive and confidential data.
So what exactly is on the Dark Web?
The dark Web is the central place for sensitive, illegal events. Following are some of the things that can be found on Dark Web:
- Marketplace for various drugs, from mild recreational and borderline to the hard drugs
- Scanned versions of certain unique books and publications
- Human trafficking
- Marketplace for various unregistered weapons and ammunitions
- Software required for deeper browsing such as Onion Browser
- Directories that contain lists of another deep web dark web websites and their links
- A few rare books that are unavailable outside
- Lots of blueprints for 3D printing from legal things to illegal things
- Building plans to offer undetected access (via secret tunnels) to important buildings;
- Wikileaks documents;
- Files containing confidential photos of various celebrities
- Websites with videos showing abuse towards children, animals, war prisoners, etc.
- Child pornography content
- Racist content
- Content promoting violence against minorities.
Dark Web and Darknets
In the real world, shady and illegal business take place in slums and criminal dens places selected for their limited foot traffic and not being marked in public maps. The addresses of such places are limited to a circle of individuals.
This is how darknets operate with restricted-access networks. The nodes of each individual dark net whether it may be servers or, computers, or routers are not visible to search engines and most browsers. This is due to the reason of using nonstandard protocols to transfer data. A normal process like a password login will not be enough to access the Dark Web.
It is a haven for murky players like drug traffickers, extortionists, arms dealers, and sellers of stolen data. Many people are aware that the dark Web exists, but very few know how to get there.
Hackers and criminals aren’t the only people who use the Dark Web. Politicians, free speech activists, whistleblowers helping investigative journalists, and many more people use the dark Web to evade persecution and communicate anonymously online. It may also be used by people for protection from online data collection.
Comparison of Surface Web, Deep Web and Dark Web
A comparison of the three Webs can be observed as following:
Brief Overview of Surface vs Deep and Dark Web
|Surface Web||Deep Web||Dark Web|
|Openly accessible||Accessible by password, encryption, or certain gateway software||Restricted to special browsers|
|Indexed by Search Engines||Not indexed by Search Engines||Not indexed by Search Engines|
|Little illegal activity||Little illegal activity outside of Dark Web||Large scale illegal activity|
|Relatively small in size||Huge in size and growing exponentially||Immeasurable due to nature|
Can Deep or Dark Web Data be scraped or extracted?
Most organizations scrape data from various sites and focus only on easily accessible content for data extraction. Surface data extraction only covers the same domain as search engines but this level of data extraction is not enough. If you would like to know more about Web Scraping, please refer to our earlier post.
Advanced data scraping methods and tools are applied to extract data from Deep Web and Dark Web, however their capability will still have limits, since they can not extract the data in case the data is password protected or encrypted.
Imagine an ocean of Web. The top of the ocean consists of a surface web, the bottom part of the ocean consists of a dark web while the deep Web lies in the middle of the ocean. Consider the surface web as the part of the ocean that spreads for miles and can be easily accessible through search engines.
Below the surface level of the ocean is the deep Web that has an unindexed internet. When you swim deeper to the bottom of the ocean, exists the dark Web that is only accessible through special tools and software.
Difference between surface web and dark Web
|Surface Web||Dark Web|
|The prime and accessible part of the internet.||It is the hidden part of the internet that requires specific software or tools to gain access.|
|Accessible using search engines like Google, Bing, Edge, etc.||Uses the Tor network and search engines like Torch, notEvil, etc.|
|Has publically available websites and web search engines that can crawl and index to find data from websites.||It houses Tor-encrypted websites that cannot be indexed easily to read website data.|
|About 4% of pages on the internet are indexed by the search engines here.||The dark Web is assumed to be 500 times larger than the surface web and it comes in 96% of the hidden internet.|
|Consists of almost 19 TB of legal and illegal data on the internet.||It contains almost 7500 TB of illegal content that is restricted from normal users on the internet.|
|Only small scale of illegal activities take place on the surface web Compared to the deep Web and dark Web||It is a hub for Criminal large-scale illegal activities and weapons, drugs, including human trafficking and cybercrimes.|
Difference between Deep Web and Dark Web
These two terms are sometimes used interchangeably as if they are more or less the same thing but nothing could be farther than the truth. The deep Web refers to non-indexed pages, while the dark Web refers to pages that are both non-indexed and involved in illegal activities.
The deep Web consists of non-indexed pages because search engines fail to notice them or they’re not relevant enough to be indexed. On the contrary, the dark Web wants to be hidden as it is a house illegal activities on purpose. The deep Web is ethically neutral and can be used for both good and bad reasons. The dark Web on the other hand is where the parts of the moral-lacking economy and society come together.
All dark Web is deep but not all deep Web is dark.
Both deep and dark nets are hidden from search engines but the main difference between the two is that deep pages can be accessed through credentials and authorization, dark pages on the other hand require a special browser and software with a decryption key. Data of deep pages are not hidden while the sole purpose of the dark net is anonymity.
Following table summarizes the difference between Deep Web and Dark Web:
|Deep Web||Dark Web|
|The deep Web is the segment of the Web that is hidden from conventional search engines.||The dark Web is that part of the deep Web that is purposefully hidden.|
|It requires a password, encryption, or special software to access this.||It requires Tor Project or a similar browser to access this.|
|It requires Tor Project or a similar browser to access this.||Though it is a subset of the Deep Web, the size of the Dark Web is immeasurable.|
|Usually used for legit purposes that may require anonymity.||Majorly used for illegal activities.|
|It includes all unindexed web pages.||It includes only a subset of unindexed web pages inside the deep Web.|
|Can be accessed with a VPN.||Requires a lot of precautions to access.|
Summing Up Surface, Deep, and Dark Web
Now we learned the difference between surface web, deep Web, and dark Web. Consider WWW as an iceberg where the smallest part of the entire network that people visit regularly is on the top, but the biggest part of the bottom-most part is unseen.
Surface, dark, and deep Web are related to each other, forming a base for online activities. Operating on the surface is generally secure but activities on deeper levels can lead to dangerous consequences. Major operations and activities on the dark Web are anonymous and usually used for illegal purposes. It is better to avoid the dark Web if it is not related to your specialization.
What Are the Dangers of the Dark Web?
The dark Web is a highly dangerous place. It leads to a lot of cybersecurity issues, especially if you’re a non-technical person just looking to satisfy your curiosity. Data is one of the most often transacted goods on the dark web marketplaces.
There are tons and tons of leaked data, credentials, and personal information for sale on the dark Web. This is the place where hackers get their data for credential attacks, identity theft, and other illegal business.
How to stay safe from the Dark Web?
If you are an average internet user, you will most never come across the Dark Web. Do not access or visit the Dark Web unless it’s required. Guard your privacy. Take precautions if you need to use the dark Web by using an encrypted privacy browser (like Tor).
Do not share any personal information about yourself there. Do not make any transactions there. Do not communicate with anyone there. Do not install any software from there. It will be a good idea to use a VPN service and a privacy centred, portable operating system like Tails, Whonix, ZuesGuard, or Qubes.
Be careful what data permissions you give (don’t click ‘yes’ on every pop-up just to get to a website). It’s better to protect your information such as accounts and documents to which only you have access.
Following are some of the best practices for accessing Dark Web:
- Use a privacy centric mobile operating system like Tails, Whonix etc. for accessing Dark Web.
- Use a VPN service to connect
- Do not visit any illegal sites.
- Do not access any illegal material.
- Do not download any files or software from the dark Web as this environment may contain malware.
- Limit your search only for serious research and not for simple questions and basic web navigation.
- Trust no one on the Dark Web as there may be dangerous consequences.
- Cover or disconnect your webcam cameras.
- Disable java scripts.
- Do not use torrent or any other similar services while surfing on the dark Web.
What are the Applications of Dark Web?
Here are important applications of the Deep Web:
- Mainly used for intelligence and Military Purposes
- Scientists use it for research
- Used by Cybercriminals and policeman
- Journalists and Whistleblowers uses it to confidentially and anonymously send sensitive data
- Political Protesters, and Anti-Censorship Advocacy Groups uses Dark Web
- Residents of Oppressive Political Regimes also have a legitimate use of Dark Web
What is Onion Routing?
Onion routing is an encryption technique used to anonymously communicate over a computer network. Messages are encapsulated in layers of encryption, just like layers of an onion.
Onion is a pseudo-top-level domain name that designates an anonymous onion service. One can access Onion routing only when they are connected to the Tor network. Here the encryption takes place through a series of network nodes called onion routers. Each layer “peels” one by one, uncovering the data of the destination.
The encrypted data is next transmitted through a series of network nodes known as an onion router. The message arrives at its destination once the last layer is decrypted.
The main advantage is, that in the entire process, the sender remains unknown and only aware of the location of the immediately preceding and following nodes.
What is Tor Project?
Tor is an anonymous browsing network that uses the Onion routing method. Here the messages and communication are encapsulated in layers of encryption, like onion layers making it hard to detect. Tor is a special browser that provides the ability to communicate anonymously.
The Tor browser when run on your computer keeps you safe on the Internet. It protects people by bouncing the communications around a distributed network while preventing other people from accessing your Internet connection. It also prevents websites from knowing about your physical location.
Tor is known to direct Internet traffic consisting of more than seven thousand relays. This helps user’s to hide their location and also stops anyone from conducting a traffic analysis or network surveillance.