Splunk Search

Determining time spent browsing by user from proxy logs

JovanMilosevic
Path Finder

I have some proxy logs in the squid format. Some entries do not have the user, though most do. I can create a transaction by source ip, and that transaction will give me the duration. However, where the logs don't have a user, there will be 2 user Id's - the user, and "-". Below is an example leaving out non relevant fields...

1373936631.040 .. 192.168.0.227 ... GET http://.. bill%20jones NONE/- text/html
1373936633.436 .. 192.168.0.227 ... GET http://.. - NONE/- text/html
1373936651.746 .. 192.168.0.227 ... GET http://.. bill%20jones DIRECT/122.160.120.150 video/x-flv

From this, I would like to output the duration, source Ip, and the user as bill%20jones

20.706 192.168.0.227 bill%20jones

Is there a way to iterate throught the multi value user names, ignoring the user "-", and coming back with the first value that isn't "-" ?

Thanks.

1 Solution

JovanMilosevic
Path Finder

I also found a way that gets over the issue with the above solution, though it has the downside of needing to process the events twice.

First I created a time based lookup...

[UserLookup]
default_match = Unknown
filename = userlookup.csv
max_matches = 1
min_matches = 1
time_field = from
time_format = %Y-%m-%d %H:%M:%S

First pass populates the lookup like so..

sourcetype=squid username!="-" 
| stats min(_time) as from by username clientip 
| convert timeformat="%Y-%m-%d %H:%M:%S" ctime(from) 
| outputlookup UserLookup

Second pass gives me the results I need...

index=proxy 
| lookup UserLookup clientip as clientip 
| transaction clientip maxpause=1m 
| stats count as sessions sum(duration) as duration sum(bytes) as bytes by username clientip

View solution in original post

0 Karma

JovanMilosevic
Path Finder

I also found a way that gets over the issue with the above solution, though it has the downside of needing to process the events twice.

First I created a time based lookup...

[UserLookup]
default_match = Unknown
filename = userlookup.csv
max_matches = 1
min_matches = 1
time_field = from
time_format = %Y-%m-%d %H:%M:%S

First pass populates the lookup like so..

sourcetype=squid username!="-" 
| stats min(_time) as from by username clientip 
| convert timeformat="%Y-%m-%d %H:%M:%S" ctime(from) 
| outputlookup UserLookup

Second pass gives me the results I need...

index=proxy 
| lookup UserLookup clientip as clientip 
| transaction clientip maxpause=1m 
| stats count as sessions sum(duration) as duration sum(bytes) as bytes by username clientip
0 Karma

JovanMilosevic
Path Finder

I did eventually get there (nearly).

sourcetype=squid 
| transaction clientip maxpause=1m mvlist=t 
| eval username=mvfilter(match(username, "[^\-]")) 
| eval username=mvindex(username,0) 
| eval clientip=mvindex(clientip,0) 
| where username!="" 
| stats count as sessions sum(duration) as duration sum(bytes) as bytes by username clientip

This appears to work so long as there's at least one non "-" user in the transaction. If there isn't, then the transaction gets dropped.

0 Karma

JovanMilosevic
Path Finder

Thanks. This stops the transactions getting dropped, and ensures that any daily stats produced on download volumes are now correct.

0 Karma

dart
Splunk Employee
Splunk Employee

If you change your where username!="" to eval username=coalesce(username, "NONE") does that give you your final desired result?

0 Karma

dart
Splunk Employee
Splunk Employee

Does a search like this do the trick sourcetype=squid | rex field=cs_user "(?<user>[^-].*)" | stats range(_time) as duration first(user) as user by c_ip?

(I'm assuming your existing user field is cs_user and the source ip is c_ip)

JovanMilosevic
Path Finder

Thanks for taking the time to respond. I'm afraid this won't crack it. If someone browses for 2 minutes at 09:00, then does nothing until another 2 minutes at 17:00, this search will have them as browsing all day.

0 Karma
Career Survey
First 500 qualified respondents will receive a $20 gift card! Tell us about your professional Splunk journey.
Get Updates on the Splunk Community!

Community Content Calendar, September edition

Welcome to another insightful post from our Community Content Calendar! We're thrilled to continue bringing ...

Splunkbase Unveils New App Listing Management Public Preview

Splunkbase Unveils New App Listing Management Public PreviewWe're thrilled to announce the public preview of ...

Leveraging Automated Threat Analysis Across the Splunk Ecosystem

Are you leveraging automation to its fullest potential in your threat detection strategy?Our upcoming Security ...