In 2019, bots make up close to 40% of the total internet traffic and more than half of these bots are malicious in nature. So, if you own or run a website, most likely a significant amount of your traffic comes from these bots.
While in recent years bots have gained a notorious reputation due to their involvement in too many cybersecurity attacks. However, there are actually good bots that can bring significant benefits both for your website and your user, and so indiscriminately blocking all these bots is typically not a good idea.
Instead, the best practice is to manage each of these bots accordingly based on their activities and reputation, but obviously, this can be easier said than done. This is why a proper bot management practice is required, and in this guide, we will learn how.
Three Main Reasons Why Blocking Bots Isn’t a Good Idea
At the surface, simply blocking all incoming bots that made a request on our site might seem like the most effective and cost-efficient solution, but as we’ve briefly discussed above, it is in most cases, not a good idea due to three issues:
- We wouldn’t want to accidentally block good bots. There are good bots owned by reputable companies that might be (very) beneficial for our website and/or the site visitors. For instance, unless you don’t want your website ranked on Google—which is highly unlikely—, you wouldn’t want to accidentally block the Googlebot.
- Bots are getting more sophisticated. Today’s bot programmers are highly skilled and very quick in adopting the latest technologies to create highly sophisticated bots that can bypass your security measures. They can, for example, mimic human behaviors like visiting several pages first before executing their actions, making non-linear mouse movements, and more. If you are not careful, you may end up blocking your legitimate human visitors.
- Attackers can turn it against you. Especially when you block a client with an error message, a persistent attacker can use the information to modify the bot so it will be even more difficult to detect its activity. The attacker can also send various different versions of the bots to test why they are being blocked to bypass your detection method. So, blocking can be counterproductive in such cases.
Instead, we should manage the bot activities accordingly, which we will discuss below.
Bot Management: What Is It?
Bot management refers to two core activities: proper detection of bad bots (and differentiating them from good bots and legitimate human users), and managing the identified bot activities accordingly.
Bot management is achieved via a bot management solution/software which will detect the bot activity. While there are various techniques that can be used, we can generally categorize these methods into three groups:
- Fingerprinting-based (static) approach: in this approach, the bot manager solution analyzes the traffic for the presence of known bot ‘fingerprints’ such as certain OS, browser types, IP address, and so on. This approach is effective in detecting and managing known bots, but the downside is that it cannot detect brand new bots without any known fingerprints. Thus, this method is more passive or static.
- Challenge-based approach: in this approach, we challenge the client with a test that is designed to be (very) easy enough to solve by human users, but very difficult if not impossible to solve by bots. CAPTCHA is the most common form of challenge-based bot management, but lately, with the availability of CAPTCHA farms, this approach has grown to be redundant.
- Behavioral-based (dynamic) approach: the bot manager solution gathers data about the client’s activities and compares them with a baseline list of behaviors in real-time. For example, compare the client’s mouse movement with human user’s known movements. This method typically makes use of AI and machine learning technologies, and bot mitigation services like DataDome can continuously learn and improve itself to recognize bot’s behaviors, even for brand new, unknown bots.
After a client has been successfully identified the bot, we now have several options to deal with it:
- Blocking (blackholing)
Thoroughly block the client from accessing your website and resources. If we are 100% sure that this client is a malicious bot and it’s not coming from a persistent attacker capable of creating more powerful bots, then blocking or blackholing is definitely the most effective and cost-efficient approach. However, as discussed above, blocking can be counterproductive if we are not careful.
Another option besides blocking is to throttle (significantly slow down) the server’s response to the bot’s requests to (significantly) slow down its activity. Bots operate on resources (which can be quite expensive), so the hope is that by slowing down its activity, they will be discouraged and will target other websites instead. This approach is effective when the bot performs rapid and persistent attacks, like in brute force attacks, and is also good in preventing false positives.
- Honey Trap
The main idea of this approach is to feed fake content and/or fake information to the bot so the attacker doesn’t realize that we have identified the bot, while also not allowing the bot to meet its objective.
An alternative to this method is to redirect the bot to a mirror website with modified/reduced content, but the idea remains the same: not allowing the bot to access our original content while letting it waste its resources.
Since blocking is no longer effective in stopping bot activities, and at the same time bots are getting more sophisticated in mimicking human behaviors, a proper bot management solution like DataDome can help you:
- Differentiate between malicious bots and legitimate human visitors
- Differentiate between bad bots and good bots, allowing beneficial good bots to access your website
- Identify the source of all bot traffic and its reputation
- Analyze each bot’s behavior and make decisions on how to manage the bot’s activities depending on its objective and reputation
Have any thoughts on this? Let us know down below in the comments or carry the discussion over to our Twitter or Facebook.