Shortly: a site publishing AdSense ads experienced a incredible increase of revenues (from 7 per day to 70 per day), but at the payment time, most of the earnings were classified as invalid traffic.
One way to discover the IPs generating this traffic is to have Google Analytics connected with AdSense.
In Analytics, go to Behavior/Publisher/Publisher Pages and order by revenue. You could discover that a single page is making 90% or more of the site earnings.
The picture shows the home page generating over the 50% of the revenues, which is absolutely unnatural.
You can think to look at the server log to intercept the traffic to the home and trying to identify the bot IP. It’s rather hard.
A smarter approach it to attempt to identify the bot by its user agent (these bad bots use common user agents, not like the Google Bot which identifies itself).
In our case we selected to see the home page details (just click on / in the publisher pages) and then we added a secondary dimension, the browser version (you can also select to drill down by browser to have the browser name and version).
What we got is:
Bingo: a Chrome version 77.0.3865.120 which is pretty old is “clicking” all the ads.
Now, using the Apache or NGINX log file, if they’re logging the user agent, you can find out the IPs from which this browser is surfing the site.
Once you have the IP you can use the MaxMind service to discover the owner. Almost surely you’ll find that is an hosting provider and not a ISP. That could confirm the browser is actually a bot.
You have now two choice to block it. Create a filter on the user agent string or deny the access to the identified IPs.
The picture below shows the effects of blocking the bad bot IP (revenue of the home page):
The overall (real) revenues of the site hasn’t changed after the IP blocks have put in place.
Using the browser language
Another trick is to check the browser language. In my example, the site was in Italian and the visitor 99% from Italy. But looking at the revenues most of them was produced by a “en-us” browser.
That helps to understand the nature f the traffic, but the language is not usually reported on Apache or NGINX logs so it’s more complicate to identify the source (a custom JS can be added to the site to log back the IP or to add a custom variable to Google Analytics).