Splunk Search

Determining time spent browsing by user from proxy logs

JovanMilosevic
Path Finder

I have some proxy logs in the squid format. Some entries do not have the user, though most do. I can create a transaction by source ip, and that transaction will give me the duration. However, where the logs don't have a user, there will be 2 user Id's - the user, and "-". Below is an example leaving out non relevant fields...

1373936631.040 .. 192.168.0.227 ... GET http://.. bill%20jones NONE/- text/html
1373936633.436 .. 192.168.0.227 ... GET http://.. - NONE/- text/html
1373936651.746 .. 192.168.0.227 ... GET http://.. bill%20jones DIRECT/122.160.120.150 video/x-flv

From this, I would like to output the duration, source Ip, and the user as bill%20jones

20.706 192.168.0.227 bill%20jones

Is there a way to iterate throught the multi value user names, ignoring the user "-", and coming back with the first value that isn't "-" ?

Thanks.

1 Solution

JovanMilosevic
Path Finder

I also found a way that gets over the issue with the above solution, though it has the downside of needing to process the events twice.

First I created a time based lookup...

[UserLookup]
default_match = Unknown
filename = userlookup.csv
max_matches = 1
min_matches = 1
time_field = from
time_format = %Y-%m-%d %H:%M:%S

First pass populates the lookup like so..

sourcetype=squid username!="-" 
| stats min(_time) as from by username clientip 
| convert timeformat="%Y-%m-%d %H:%M:%S" ctime(from) 
| outputlookup UserLookup

Second pass gives me the results I need...

index=proxy 
| lookup UserLookup clientip as clientip 
| transaction clientip maxpause=1m 
| stats count as sessions sum(duration) as duration sum(bytes) as bytes by username clientip

View solution in original post

0 Karma

JovanMilosevic
Path Finder

I also found a way that gets over the issue with the above solution, though it has the downside of needing to process the events twice.

First I created a time based lookup...

[UserLookup]
default_match = Unknown
filename = userlookup.csv
max_matches = 1
min_matches = 1
time_field = from
time_format = %Y-%m-%d %H:%M:%S

First pass populates the lookup like so..

sourcetype=squid username!="-" 
| stats min(_time) as from by username clientip 
| convert timeformat="%Y-%m-%d %H:%M:%S" ctime(from) 
| outputlookup UserLookup

Second pass gives me the results I need...

index=proxy 
| lookup UserLookup clientip as clientip 
| transaction clientip maxpause=1m 
| stats count as sessions sum(duration) as duration sum(bytes) as bytes by username clientip

View solution in original post

0 Karma

JovanMilosevic
Path Finder

I did eventually get there (nearly).

sourcetype=squid 
| transaction clientip maxpause=1m mvlist=t 
| eval username=mvfilter(match(username, "[^\-]")) 
| eval username=mvindex(username,0) 
| eval clientip=mvindex(clientip,0) 
| where username!="" 
| stats count as sessions sum(duration) as duration sum(bytes) as bytes by username clientip

This appears to work so long as there's at least one non "-" user in the transaction. If there isn't, then the transaction gets dropped.

0 Karma

JovanMilosevic
Path Finder

Thanks. This stops the transactions getting dropped, and ensures that any daily stats produced on download volumes are now correct.

0 Karma

dart
Splunk Employee
Splunk Employee

If you change your where username!="" to eval username=coalesce(username, "NONE") does that give you your final desired result?

0 Karma

dart
Splunk Employee
Splunk Employee

Does a search like this do the trick sourcetype=squid | rex field=cs_user "(?<user>[^-].*)" | stats range(_time) as duration first(user) as user by c_ip?

(I'm assuming your existing user field is cs_user and the source ip is c_ip)

JovanMilosevic
Path Finder

Thanks for taking the time to respond. I'm afraid this won't crack it. If someone browses for 2 minutes at 09:00, then does nothing until another 2 minutes at 17:00, this search will have them as browsing all day.

0 Karma
Register for .conf21 Now! Go Vegas or Go Virtual!

How will you .conf21? You decide! Go in-person in Las Vegas, 10/18-10/21, or go online with .conf21 Virtual, 10/19-10/20.