- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have some proxy logs in the squid format. Some entries do not have the user, though most do. I can create a transaction by source ip, and that transaction will give me the duration. However, where the logs don't have a user, there will be 2 user Id's - the user, and "-". Below is an example leaving out non relevant fields...
1373936631.040 .. 192.168.0.227 ... GET http://.. bill%20jones NONE/- text/html
1373936633.436 .. 192.168.0.227 ... GET http://.. - NONE/- text/html
1373936651.746 .. 192.168.0.227 ... GET http://.. bill%20jones DIRECT/122.160.120.150 video/x-flv
From this, I would like to output the duration, source Ip, and the user as bill%20jones
20.706 192.168.0.227 bill%20jones
Is there a way to iterate throught the multi value user names, ignoring the user "-", and coming back with the first value that isn't "-" ?
Thanks.
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I also found a way that gets over the issue with the above solution, though it has the downside of needing to process the events twice.
First I created a time based lookup...
[UserLookup]
default_match = Unknown
filename = userlookup.csv
max_matches = 1
min_matches = 1
time_field = from
time_format = %Y-%m-%d %H:%M:%S
First pass populates the lookup like so..
sourcetype=squid username!="-"
| stats min(_time) as from by username clientip
| convert timeformat="%Y-%m-%d %H:%M:%S" ctime(from)
| outputlookup UserLookup
Second pass gives me the results I need...
index=proxy
| lookup UserLookup clientip as clientip
| transaction clientip maxpause=1m
| stats count as sessions sum(duration) as duration sum(bytes) as bytes by username clientip
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I also found a way that gets over the issue with the above solution, though it has the downside of needing to process the events twice.
First I created a time based lookup...
[UserLookup]
default_match = Unknown
filename = userlookup.csv
max_matches = 1
min_matches = 1
time_field = from
time_format = %Y-%m-%d %H:%M:%S
First pass populates the lookup like so..
sourcetype=squid username!="-"
| stats min(_time) as from by username clientip
| convert timeformat="%Y-%m-%d %H:%M:%S" ctime(from)
| outputlookup UserLookup
Second pass gives me the results I need...
index=proxy
| lookup UserLookup clientip as clientip
| transaction clientip maxpause=1m
| stats count as sessions sum(duration) as duration sum(bytes) as bytes by username clientip
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I did eventually get there (nearly).
sourcetype=squid
| transaction clientip maxpause=1m mvlist=t
| eval username=mvfilter(match(username, "[^\-]"))
| eval username=mvindex(username,0)
| eval clientip=mvindex(clientip,0)
| where username!=""
| stats count as sessions sum(duration) as duration sum(bytes) as bytes by username clientip
This appears to work so long as there's at least one non "-" user in the transaction. If there isn't, then the transaction gets dropped.
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks. This stops the transactions getting dropped, and ensures that any daily stats produced on download volumes are now correct.
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

If you change your where username!=""
to eval username=coalesce(username, "NONE")
does that give you your final desired result?
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Does a search like this do the trick sourcetype=squid | rex field=cs_user "(?<user>[^-].*)" | stats range(_time) as duration first(user) as user by c_ip
?
(I'm assuming your existing user field is cs_user and the source ip is c_ip)
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for taking the time to respond. I'm afraid this won't crack it. If someone browses for 2 minutes at 09:00, then does nothing until another 2 minutes at 17:00, this search will have them as browsing all day.
