Identifying types of bots being blocked, and is Cerber blocking search engine bots?

mediastead · February 20, 2024, 9:57pm

Hello,

I was reading a Cerber/WordFence comparison article and it was mentioned that Cerber doesn’t differentiate between good and bad bots, effectively blocking search engine bots and ultimately wreaking havoc on a site’s ranking in the search engines.

I don’t want to jump to conclusions because I have been using Cerber for a long time and very much like the plugin.

However, is there a way to confirm whether good bots are being blocked?

I did recently run into an issue where I was using MainWP on my websites, and the IP address of the server I have my MainWP on was being blocked by Cerber, and I had to whitelist it. This is making me feel like Cerber indeed doesn’t differentiate and just blocks all bots.

I’d love to get to the bottom of this so I can have some peace of mind.

Additionally, it was said that “There is some spambot prevention, which is a positive feature. However, spambots are not the only type of malicious bots. Thus, this provides only limited protection and can be a problem. So, while WP Cerber does deflect some threats, it doesn’t deflect them all nor does it deflect the right ones.”

I am trying to figure out how to know if Cerber is indeed blocking all malicious bots and traffic, or if it is very limited in the type and number of threats it is identifying.

Thanks so much! I am looking forward to reading some replies.

gioni · February 21, 2024, 12:41pm

First of all, WP Cerber does not block search engine bots, including those from Google, from indexing a website. This is because WP Cerber has no code that prevents a website from being crawled by a bot.

Some people might complain based on the user agent string of a blocked IP they see in the log. I believe they have no idea that a user agent string is generated by the request sender and can hold any value, thus mimicking Google bots. There is no way to verify whether it is a Google bot or just a spammer based on the user agent string alone. However, we have seen legitimate concerns that an authentic Google bot was blocked due to attempts to submit a form or access private parts of a website. Here, I want to ask: What do these requests have to do with indexing your website? It’s more about the problem of Google trying to aggressively index everything they can reach, including private parts and documents on your site. Do website owners really want these in Google’s index? Aren’t there enough scandals related to data leaks enabled by Google’s indexing?

Another fair question is what makes a bot good and what makes it a bad one. Good bots use the REST API, while bad bots send POST requests to the home URL of a website. The WordPress REST API is well-documented and has been available for years, yet some vendors have not implemented this simple API in their products and continue using an outdated scheme of sending plain POST requests. It’s okay to use this type of request for some internal needs and bots, but if you offer a software product to millions of users worldwide, this is inappropriate and obsolete technology.

Let’s talk about WP Cerber. When detecting bots, whether good or bad, false positives are unavoidable. Different anti-spam algorithms provide different results and thus different false positive rates. By the way, reCAPTCHA is no exception here Since WP Cerber’s anti-spam algorithms are stricter than those of an average anti-spam plugin for WordPress, it’s possible to see a bit more false positives than some might expect. Simply put, WP Cerber can sometimes unintentionally block good bots. There is no technical way to distinguish between good bots and bad ones except the one I mentioned above. Bots are bots. If WP Cerber detects a bot doing quirky things, it blocks the bot. Only a website admin can determine if this particular bot activity is OK for their website. You can view all bad bot events using the Spam Events filter on the Activity tab.

If you come across a situation where WP Cerber blocks a good old-school bot that submits data to your website without using the REST API, you should add an exception for such a bot. In the WP Cerber settings, you can configure several types of exceptions for the anti-spam engine. We recommend using them in the following order of preference: URL-based, HTTP header-based, or IP-based.

Complaints about good bots being blocked are not exclusively related to WP Cerber. For example, on the Zapier website regarding Wordfence, they directly recommend disabling a security plugin in favor of their own product. What a piece of wisdom!

Some people go further, preferring convenience over security, and have no security plugin in place. I would say that improper use of Zapier is a serious issue in terms of privacy and security. People use Zapier to submit personal data indiscriminately without authentication or any form of verification of where and what they send. The correct way to implement such basic security measures is by using authentication tokens in the HTTP header generated by Zapier.

mediastead · February 21, 2024, 6:37pm

@gioni Thanks so much for your thoughtful response. This has helped me a great deal in knowing how to distinguish between good and bad bots, and how to deal with them in my Cerber interface. I will be more mindful of this in the future, and perhaps do random “spot checks” occasionally, so I can continue to refine the settings for each of my websites.

I guess that last question about the WAF about spam bots vs. other types of bots, etc. I see in the Cerber activity interface that there is a filter that categorizes each attempt. In that list, I don’t see things like SQL Injection or Code Injection, XXS, DDoS, etc. Is there a way to know if these attacks are happening within the Cerber interface?

I recently had a couple of sites hacked that were running Cerber. The hacks happened because the builder we use has a major hole in it and when the vulnerability announcement was made, TONS of people running this builder were hacked within the hour.

Can you talk a little bit about this for me? I am trying to get a bit more educated on the security side of things, specifically Cerber, WAFs in general, as well as sever-side security, and whether thee would have been any way to prevent this type of hack form happening.