Splunk Search

How to create a report to get session based user-agent statistics on a per site basis (from Tomcat logs)?

Communicator

Hey there,

I've been learning how to use the search features in Splunk and trying to find a way to get some user-agent metrics from all of our aggregated Tomcat log data. There are 3 key fields that I'm looking at to determine how many browsers are hitting different websites that we run:

site (this is the domain URL from the logs, consider the format to be blahblah.companyname.com)
tr_cookie (a tracking cookie with a unique id per session)
http_user_agent (the user agent we pull from each log entry)

What I need to do is generate a report that counts how many http_user_agents exist using the tr_cookie as a source (this cookie will never have two different http_user_agent values) and then break that down into statistics on a per 'site' basis. I have two searches so far:

index=tomcat_logs  "GET /company/index.php " | stats dc(tr_cookie) as sessions by http_user_agent

The above gives me the overall breakdown of how many unique sessions can be attributed to different http_user_agents. So it will say that 100 people are using ie8. 50 people are using Firefox and etc, ex:

Mozilla/5.0 (Windows NT 6.1; WOW64; Trident/7.0; rv:11.0) like Gecko    541
Mozilla/5.0 (Windows NT 6.1; Trident/7.0; rv:11.0) like Gecko   198
Mozilla/5.0 (X11; Linux i686) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/40.0.2214.94 Safari/537.36  172

But what I really need is something like this:

somecompany.companyname.com:
     Mozilla/5.0 (Windows NT 6.1; WOW64; Trident/7.0; rv:11.0) like Gecko   300
     Mozilla/5.0 (Windows NT 6.1; Trident/7.0; rv:11.0) like Gecko   100
     Mozilla/5.0 (X11; Linux i686) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/40.0.2214.94 Safari/537.36 75

someothercompany.companyname.com:
     Mozilla/5.0 (Windows NT 6.1; WOW64; Trident/7.0; rv:11.0) like Gecko   200
     Mozilla/5.0 (Windows NT 6.1; Trident/7.0; rv:11.0) like Gecko   110
     Mozilla/5.0 (X11; Linux i686) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/40.0.2214.94 Safari/537.36 63

The closest that I have gotten is this search, but it just lists the statistics line by line with the "site" line repeating over and over:

 index=tomcat_logs  "GET /company/index.php " | stats dc(SKP_TS) as sessions by http_user_agent, site

Example:

Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)  XXXXX.companyname.com   1
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 2.0.50727; .NET CLR 1.1.4322; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729)  XXXXX.companyname.com   1
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 2.0.50727; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729) XXXXX.companyname.com   1
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 3.0.04506.30; .NET CLR 1.1.4322; .NET CLR 2.0.50727; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729)   XXXXX.companyname.com   2
Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 10.0; WOW64; Trident/7.0; .NET4.0C; .NET4.0E; InfoPath.3; .NET CLR 2.0.50727; .NET CLR 3.0.30729; .NET CLR 3.5.30729) YYYYY.companyname.com   1
Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 10.0; WOW64; Trident/7.0; Touch; .NET4.0C; .NET4.0E; .NET CLR 2.0.50727; .NET CLR 3.0.30729; .NET CLR 3.5.30729; Tablet PC 2.0)   ZZZZZ.companyname.com   1

Any idea how to make this happen?

0 Karma
1 Solution

Motivator

This works.

| search .....
| stats dc(SKP_TS) as sessions by site, http_user_agent
| streamstats current=false last(site) as previousSite 
| eval site=if(match(site,previousSite),"",site)
| fields site, http_user_agent, sessions

I used sideview's answer from https://answers.splunk.com/answers/25102/question-regarding-grouping-of-results-into-a-table.html

View solution in original post

0 Karma

Motivator

This works.

| search .....
| stats dc(SKP_TS) as sessions by site, http_user_agent
| streamstats current=false last(site) as previousSite 
| eval site=if(match(site,previousSite),"",site)
| fields site, http_user_agent, sessions

I used sideview's answer from https://answers.splunk.com/answers/25102/question-regarding-grouping-of-results-into-a-table.html

View solution in original post

0 Karma

Communicator

That does indeed the give the effect I'm looking for.

0 Karma

Esteemed Legend

Like this:

... | stats dc(SKP_TS) AS sessions BY site http_user_agent
0 Karma

Communicator

Also close but it looks like this:

 Site:                        User-Agent:            Sessions: 
 blah.companyname.com                 Internet Explorer 8        3
 blah.companyname.com                       Internet Explorer 9            2
 blah.companyname.com                       Internet Explorer 10            5

 junk.companyname.com                 Internet Explorer 8        2
 junk.companyname.com                        Internet Explorer 9            6
 junk.companyname.com                         Internet Explorer 10            5
0 Karma

Legend

Try this

... | stats values(http_user_agent) as UA dc(SKP_TS) as sessions by site
0 Karma

Communicator

That's somewhat closer but what that returns is the total count of unique SKP_TS sessions for each site, along side each unique value of the user-agent. What I need is the unique count of SKP_TS sessions per user-agent per site.

So instead of:

Site:                       User-Agent:         Sessions: 
blah.companyname.com    Internet Explorer 8     10
                        Internet Explorer 9
                        Internet Explorer 10

junk.companyname.com    Internet Explorer 8     13
                        Internet Explorer 9
                        Internet Explorer 10

It should be:

Site:                       User-Agent:         Sessions: 
blah.companyname.com    Internet Explorer 8     3
                        Internet Explorer 9         2
                        Internet Explorer 10            5

junk.companyname.com    Internet Explorer 8     2
                        Internet Explorer 9         6
                        Internet Explorer 10            5
0 Karma

Communicator

I know the syntax doesn't make sense but in my mind it's like this:

index=tomcat_logs  "GET /company/index.php " | (stats dc(tr_cookie) as sessions by http_user_agent) by site

Essentially using by twice.

0 Karma