I'm fairly new to Splunk. I have a working local enterprise installation, and have installed Splunk App for Web Analytics, however, under Audience data by country, the data we see here doesn't appear to agree with what we know about the webserver logs we've imported..
Under Splunk App for Web Analytics Audience - we see the attached country order
However, when we run a custom report with our own filters / extractions for detecting bots (and our AWS Route53 health monitor), we get something like the following...
(this is not limited to page views). Country data here is still not what we would expect, varying significantly from Google analytics - this time with France at the top, however, the ratios here align with Google analytics if we consider the Google Analytic data correct (and ignore the country names in our own report).
1 Thailand 5,213(53.79%)
2 Myanmar (Burma) 658(6.79%)
3 Indonesia 553(5.71%)
4 Cambodia 518(5.35%)
5 United States 478(4.93%)
6 Nepal 223(2.30%)
7 Vietnam 197(2.03%)
8 Laos 196(2.02%)
9 India 163(1.68%)
10 Japan 144(1.49%)
This last set aligns more with what we know about our data.
So the question is - how could we unpick this in order to determine why the country report is so different from what we expect?
Any thoughts or suggestions greatly appreciated.
The first panel from Web Analytics app shows you Sessions. This is very different from entries in the log files which is what you are showing in your second screenshot.
A session is defined as one or many interactions that are linked by same user/device for a time specific timeperiod. There could be 100s of pageviews for one session. A session also needs to include at least one pageview -i.e. we exclude hotlinking to resources or robots just looking at the robots.txt file. The Web Analytics app define sessions similarly to Google Analytics but it won't be 100% identical. Also bear in mind that Google Analytics samples their data so everything in there is just estimates.
My interpretation of Russia being on top is that there are many sessions but each session might only contain one or just a few actual page views. Hence it is much lower in the log entry ranking.
You can check out these ratios under the Troubleshooting dashboards to show actual IPs between pageviews vs sessions per IP. It will likely show up some bots on top.
In the Top Countries panel in Splunk App for Web Analytics Audience there is a magnifying glass icon. Click on it to open the panel's search in a new tab. This lets you examine the search used by the panel so you can compare it to your own search and, one hopes, explain the discrepancy.