Splunk Search

Determining time spent browsing by user from proxy logs

JovanMilosevic
Path Finder

I have some proxy logs in the squid format. Some entries do not have the user, though most do. I can create a transaction by source ip, and that transaction will give me the duration. However, where the logs don't have a user, there will be 2 user Id's - the user, and "-". Below is an example leaving out non relevant fields...

1373936631.040 .. 192.168.0.227 ... GET http://.. bill%20jones NONE/- text/html
1373936633.436 .. 192.168.0.227 ... GET http://.. - NONE/- text/html
1373936651.746 .. 192.168.0.227 ... GET http://.. bill%20jones DIRECT/122.160.120.150 video/x-flv

From this, I would like to output the duration, source Ip, and the user as bill%20jones

20.706 192.168.0.227 bill%20jones

Is there a way to iterate throught the multi value user names, ignoring the user "-", and coming back with the first value that isn't "-" ?

Thanks.

1 Solution

JovanMilosevic
Path Finder

I also found a way that gets over the issue with the above solution, though it has the downside of needing to process the events twice.

First I created a time based lookup...

[UserLookup]
default_match = Unknown
filename = userlookup.csv
max_matches = 1
min_matches = 1
time_field = from
time_format = %Y-%m-%d %H:%M:%S

First pass populates the lookup like so..

sourcetype=squid username!="-" 
| stats min(_time) as from by username clientip 
| convert timeformat="%Y-%m-%d %H:%M:%S" ctime(from) 
| outputlookup UserLookup

Second pass gives me the results I need...

index=proxy 
| lookup UserLookup clientip as clientip 
| transaction clientip maxpause=1m 
| stats count as sessions sum(duration) as duration sum(bytes) as bytes by username clientip

View solution in original post

0 Karma

JovanMilosevic
Path Finder

I also found a way that gets over the issue with the above solution, though it has the downside of needing to process the events twice.

First I created a time based lookup...

[UserLookup]
default_match = Unknown
filename = userlookup.csv
max_matches = 1
min_matches = 1
time_field = from
time_format = %Y-%m-%d %H:%M:%S

First pass populates the lookup like so..

sourcetype=squid username!="-" 
| stats min(_time) as from by username clientip 
| convert timeformat="%Y-%m-%d %H:%M:%S" ctime(from) 
| outputlookup UserLookup

Second pass gives me the results I need...

index=proxy 
| lookup UserLookup clientip as clientip 
| transaction clientip maxpause=1m 
| stats count as sessions sum(duration) as duration sum(bytes) as bytes by username clientip
0 Karma

JovanMilosevic
Path Finder

I did eventually get there (nearly).

sourcetype=squid 
| transaction clientip maxpause=1m mvlist=t 
| eval username=mvfilter(match(username, "[^\-]")) 
| eval username=mvindex(username,0) 
| eval clientip=mvindex(clientip,0) 
| where username!="" 
| stats count as sessions sum(duration) as duration sum(bytes) as bytes by username clientip

This appears to work so long as there's at least one non "-" user in the transaction. If there isn't, then the transaction gets dropped.

0 Karma

JovanMilosevic
Path Finder

Thanks. This stops the transactions getting dropped, and ensures that any daily stats produced on download volumes are now correct.

0 Karma

dart
Splunk Employee
Splunk Employee

If you change your where username!="" to eval username=coalesce(username, "NONE") does that give you your final desired result?

0 Karma

dart
Splunk Employee
Splunk Employee

Does a search like this do the trick sourcetype=squid | rex field=cs_user "(?<user>[^-].*)" | stats range(_time) as duration first(user) as user by c_ip?

(I'm assuming your existing user field is cs_user and the source ip is c_ip)

JovanMilosevic
Path Finder

Thanks for taking the time to respond. I'm afraid this won't crack it. If someone browses for 2 minutes at 09:00, then does nothing until another 2 minutes at 17:00, this search will have them as browsing all day.

0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...