Security

Time on websites (total session times)

Path Finder

Hi there,

We have as you would expect a bunch of firewall / content keeper logs in our splunk instance and or splunk guys wish to report on the time a user spends on each website (domain).

Basically, I am trying to see if there is any "easy"...ish, way of a determining a "session" for each domain and then adding them to display the the total time a user spends on each domain (roughly).

Lets say we start with a generic search against my firewall logs and a specific user.
Leaving us with an output of a single users requests in chronological order.

ANY help you could provide would be very very appreciated.

Thanks,
Aaron.

Tags (3)
0 Karma
1 Solution

Legend

As you've already discussed it's hard to get really meaningful stats for the reasons cmeo outlines. But, it's certainly possible to create the stats based on the rules you suggested.

If using the firewall logs for this, I don't know exactly what fields are at your disposal - but let's say you have at least a source IP, a destination IP and a destination port. Our unique identifier for a certain web session could be based on these fields. In that case it's possible to build a transaction that joins separate events together to a new combined event (a transaction) based on rules that you specify. Upon creating a transaction, Splunk will write the time difference between its first and last event into a field called duration. What you do is create this transaction saying "join events having the same source IP, destination IP and port, but only if it's less than 30 minutes between one event and the next". Translated to a search, this would look something like:

<yourbasesearch>
| transaction src_ip dest_ip dest_port maxpause=30m

OK, now you have a bunch of transactions with corresponding duration fields that you need to sum together for each "session" to create a grand total. Use stats for this.

<yourbasesearch>
| transaction src_ip dest_ip dest_port maxpause=30m
| stats sum(duration) AS session_time by src_ip,dest_ip,dest_port

This will give you a table with a list of "total session times" for each srcIP/destIP/destport pair that was found in your search, according to the rules you specified.

View solution in original post

Engager

You could by service say 80 or 443
but the max pause will still be an issue

Legend

As you've already discussed it's hard to get really meaningful stats for the reasons cmeo outlines. But, it's certainly possible to create the stats based on the rules you suggested.

If using the firewall logs for this, I don't know exactly what fields are at your disposal - but let's say you have at least a source IP, a destination IP and a destination port. Our unique identifier for a certain web session could be based on these fields. In that case it's possible to build a transaction that joins separate events together to a new combined event (a transaction) based on rules that you specify. Upon creating a transaction, Splunk will write the time difference between its first and last event into a field called duration. What you do is create this transaction saying "join events having the same source IP, destination IP and port, but only if it's less than 30 minutes between one event and the next". Translated to a search, this would look something like:

<yourbasesearch>
| transaction src_ip dest_ip dest_port maxpause=30m

OK, now you have a bunch of transactions with corresponding duration fields that you need to sum together for each "session" to create a grand total. Use stats for this.

<yourbasesearch>
| transaction src_ip dest_ip dest_port maxpause=30m
| stats sum(duration) AS session_time by src_ip,dest_ip,dest_port

This will give you a table with a list of "total session times" for each srcIP/destIP/destport pair that was found in your search, according to the rules you specified.

View solution in original post

Path Finder

Hmmm... I appear to have something happening that's not quite what I'm after.
Technically, the total time on a single domain should not be able to exceed the time period of the logs specificed.
Ie. I have a base search containing 3 days of logs, means I can't be on the site "google.com" for more than 3 days in total.
However, with this search... I am... about 27 days infact.
Is there no way of calculating this like how I mentioned earlier?
Basically so that the "period" spent on a site is calculated by an actual "timeout" value, rather than just assigning a period of time for every "hit".

0 Karma

SplunkTrust
SplunkTrust

Another thing that would be useful is if webapp session cookies were logged when they are used (like J2EE JSESSIONID) -- then you could identify distinct user sessions according to the activity presented by that session ID

0 Karma

Path Finder

You have answered (and explained) absolutely everything I wanted!
Thank you so, so much!

I can now generate exactly what they're after.
Thank you!

0 Karma

Path Finder

The key would be session time, in other words lets say we make it a "magical" 30 minutes.

So, said user connects to a site, then 10 minutes later they connect again... another 5 minutes goes on and they connect once more... then three days later they reconnect and again 60 seconds later... that's it for the month.

This means they spent a total of 10 + 5 + 1 = 16 minutes on that site.

There's no way of even contemplating such a thing...?

0 Karma

Contributor

I have had this same discussion with a customer some months ago. Here is what I sent them:


The problem I thought of with this is--what exactly are you measuring?
http is connectionless, so there isn't exactly a start and end of a
session to track...

I came up with some scenarios:

  1. User is interacting with a travel booking site. For the duration of
    their activities, there will be a stream of http traffic, puts and gets
    etc. No problem here.

  2. User opens a newspaper or mag and reads a long article. You might have one set
    of interactions as they get the page; they might sit there reading it
    for half an hour. You won't know anything until they browse the next web
    site. Alternatively, they might skim it in a minute and leave it open
    for half an hour in background. What, then, is the duration of their
    stay at the site?

  3. User opens multiple bookmarks in tabs but doesn't read any of them.
    Any traffic information here might be highly misleading; they might not
    in fact interact with any, but they could be open on the screen all day.

I don't think what you want to do can be done in a meaningful way--not with splunk anyway.

Path Finder

I completely agree and that's what I told the group in the first place.

However, they are keen to at least have some stats that can look shiny... no matter how pointless they truly are.

0 Karma
State of Splunk Careers

Access the Splunk Careers Report to see real data that shows how Splunk mastery increases your value and job satisfaction.

Find out what your skills are worth!