I have some proxy logs in the squid format. Some entries do not have the user, though most do. I can create a transaction by source ip, and that transaction will give me the duration. However, where the logs don't have a user, there will be 2 user Id's - the user, and "-". Below is an example leaving out non relevant fields...
1373936631.040 .. 192.168.0.227 ... GET http://.. bill%20jones NONE/- text/html
1373936633.436 .. 192.168.0.227 ... GET http://.. - NONE/- text/html
1373936651.746 .. 192.168.0.227 ... GET http://.. bill%20jones DIRECT/122.160.120.150 video/x-flv
From this, I would like to output the duration, source Ip, and the user as bill%20jones
20.706 192.168.0.227 bill%20jones
Is there a way to iterate throught the multi value user names, ignoring the user "-", and coming back with the first value that isn't "-" ?
Thanks.
I also found a way that gets over the issue with the above solution, though it has the downside of needing to process the events twice.
First I created a time based lookup...
[UserLookup]
default_match = Unknown
filename = userlookup.csv
max_matches = 1
min_matches = 1
time_field = from
time_format = %Y-%m-%d %H:%M:%S
First pass populates the lookup like so..
sourcetype=squid username!="-"
| stats min(_time) as from by username clientip
| convert timeformat="%Y-%m-%d %H:%M:%S" ctime(from)
| outputlookup UserLookup
Second pass gives me the results I need...
index=proxy
| lookup UserLookup clientip as clientip
| transaction clientip maxpause=1m
| stats count as sessions sum(duration) as duration sum(bytes) as bytes by username clientip
I also found a way that gets over the issue with the above solution, though it has the downside of needing to process the events twice.
First I created a time based lookup...
[UserLookup]
default_match = Unknown
filename = userlookup.csv
max_matches = 1
min_matches = 1
time_field = from
time_format = %Y-%m-%d %H:%M:%S
First pass populates the lookup like so..
sourcetype=squid username!="-"
| stats min(_time) as from by username clientip
| convert timeformat="%Y-%m-%d %H:%M:%S" ctime(from)
| outputlookup UserLookup
Second pass gives me the results I need...
index=proxy
| lookup UserLookup clientip as clientip
| transaction clientip maxpause=1m
| stats count as sessions sum(duration) as duration sum(bytes) as bytes by username clientip
I did eventually get there (nearly).
sourcetype=squid
| transaction clientip maxpause=1m mvlist=t
| eval username=mvfilter(match(username, "[^\-]"))
| eval username=mvindex(username,0)
| eval clientip=mvindex(clientip,0)
| where username!=""
| stats count as sessions sum(duration) as duration sum(bytes) as bytes by username clientip
This appears to work so long as there's at least one non "-" user in the transaction. If there isn't, then the transaction gets dropped.
Thanks. This stops the transactions getting dropped, and ensures that any daily stats produced on download volumes are now correct.
If you change your where username!=""
to eval username=coalesce(username, "NONE")
does that give you your final desired result?
Does a search like this do the trick sourcetype=squid | rex field=cs_user "(?<user>[^-].*)" | stats range(_time) as duration first(user) as user by c_ip
?
(I'm assuming your existing user field is cs_user and the source ip is c_ip)
Thanks for taking the time to respond. I'm afraid this won't crack it. If someone browses for 2 minutes at 09:00, then does nothing until another 2 minutes at 17:00, this search will have them as browsing all day.