Large number of "page not found" on Entry Page report | Community
Skip to main content
July 15, 2021
Solved

Large number of "page not found" on Entry Page report

  • July 15, 2021
  • 2 replies
  • 868 views

Running Entry Page report for our site.  Getting a very large number of "page no found" as the entry page.  Looked more into it and 98%+ of those are Typed/Bookmarked.  I know from past experience that we don't have any download counts for any files that even get close to the number reported of "page not found".   The number has increased some over the last couple of months but is pretty consistent overall for the last year.

 

Almost all 90%+ are from the US where our traffic is about 60% US.

 

Anyone have any idea?  I'm thinking it is some kind of bot doing scan (we do have that daily for the site).

 

Thanks

This post is no longer active and is closed to new replies. Need help? Start a new post to ask your question.
Best answer by Brian_Johnson_

@brettv52917362 - It definitely could be bot activity, especially if there are no referrers associated with the bulk of the traffic. If you haven't already, you might check bounce stats for these pages. If there's anywhere close to a 1:1 ratio of visits:bounces, it's probably safe to assume bot activity.

 

A couple of things I always check in these scenarios are the domain and geo reports. With bot traffic, you'll find it often comes from the same network and/or geographic location. (I recently looked at something similar and found most of the traffic was coming from googlebot.com and was centered in the same US state.) If you find that it's your own scans/bots that are to blame, you'll have to decide what you want to do...

 

  • If you want to see the scan data in Analytics, you'll have to change the scans to make sure they're hitting real pages (not 404 pages)
  • If you do NOT want to see the scan data in analytics, you might try one of the following:
    • Use Analytics' IP exclusion capabilities
    • Make sure the bot uses a recognizable user-agent string that can be added as a custom bot filter

 

2 replies

Brian_Johnson_
Brian_Johnson_Accepted solution
Level 8
July 15, 2021

@brettv52917362 - It definitely could be bot activity, especially if there are no referrers associated with the bulk of the traffic. If you haven't already, you might check bounce stats for these pages. If there's anywhere close to a 1:1 ratio of visits:bounces, it's probably safe to assume bot activity.

 

A couple of things I always check in these scenarios are the domain and geo reports. With bot traffic, you'll find it often comes from the same network and/or geographic location. (I recently looked at something similar and found most of the traffic was coming from googlebot.com and was centered in the same US state.) If you find that it's your own scans/bots that are to blame, you'll have to decide what you want to do...

 

  • If you want to see the scan data in Analytics, you'll have to change the scans to make sure they're hitting real pages (not 404 pages)
  • If you do NOT want to see the scan data in analytics, you might try one of the following:
    • Use Analytics' IP exclusion capabilities
    • Make sure the bot uses a recognizable user-agent string that can be added as a custom bot filter

 

July 15, 2021

Thanks for the quick answer.

 

I checked the domains and the two (by far) largest are Microsoft.com and Amazonaws.com.  In the company we are using both clouds for some tools but not anything for our site. In clicking around in analytics I saw that the vast number of these is for "downloads".