Expand my Community achievements bar.

Join us for the next Community Q&A Coffee Break on Tuesday April 23, 2024 with Eric Matisoff, Principal Evangelist, Analytics & Data Science, who will join us to discuss all the big news and announcements from Summit 2024!
SOLVED

Large number of "page not found" on Entry Page report

Avatar

Level 1

Running Entry Page report for our site.  Getting a very large number of "page no found" as the entry page.  Looked more into it and 98%+ of those are Typed/Bookmarked.  I know from past experience that we don't have any download counts for any files that even get close to the number reported of "page not found".   The number has increased some over the last couple of months but is pretty consistent overall for the last year.

 

Almost all 90%+ are from the US where our traffic is about 60% US.

 

Anyone have any idea?  I'm thinking it is some kind of bot doing scan (we do have that daily for the site).

 

Thanks

1 Accepted Solution

Avatar

Correct answer by
Level 8

@brettv52917362 - It definitely could be bot activity, especially if there are no referrers associated with the bulk of the traffic. If you haven't already, you might check bounce stats for these pages. If there's anywhere close to a 1:1 ratio of visits:bounces, it's probably safe to assume bot activity.

 

A couple of things I always check in these scenarios are the domain and geo reports. With bot traffic, you'll find it often comes from the same network and/or geographic location. (I recently looked at something similar and found most of the traffic was coming from googlebot.com and was centered in the same US state.) If you find that it's your own scans/bots that are to blame, you'll have to decide what you want to do...

 

  • If you want to see the scan data in Analytics, you'll have to change the scans to make sure they're hitting real pages (not 404 pages)
  • If you do NOT want to see the scan data in analytics, you might try one of the following:
    • Use Analytics' IP exclusion capabilities
    • Make sure the bot uses a recognizable user-agent string that can be added as a custom bot filter

 

View solution in original post

2 Replies

Avatar

Correct answer by
Level 8

@brettv52917362 - It definitely could be bot activity, especially if there are no referrers associated with the bulk of the traffic. If you haven't already, you might check bounce stats for these pages. If there's anywhere close to a 1:1 ratio of visits:bounces, it's probably safe to assume bot activity.

 

A couple of things I always check in these scenarios are the domain and geo reports. With bot traffic, you'll find it often comes from the same network and/or geographic location. (I recently looked at something similar and found most of the traffic was coming from googlebot.com and was centered in the same US state.) If you find that it's your own scans/bots that are to blame, you'll have to decide what you want to do...

 

  • If you want to see the scan data in Analytics, you'll have to change the scans to make sure they're hitting real pages (not 404 pages)
  • If you do NOT want to see the scan data in analytics, you might try one of the following:
    • Use Analytics' IP exclusion capabilities
    • Make sure the bot uses a recognizable user-agent string that can be added as a custom bot filter

 

Avatar

Level 1

Thanks for the quick answer.

 

I checked the domains and the two (by far) largest are Microsoft.com and Amazonaws.com.  In the company we are using both clouds for some tools but not anything for our site. In clicking around in analytics I saw that the vast number of these is for "downloads".