Hi all,
I'm trying to modify the SplunkforSquid app to read my squid custom log file format correctly. As per squid.conf it is-
logformat test %ts.%03tu %6tr %>a %Ss/%03Hs 0 %03Hs %st %rm %ru %un %<A
Log format codes (trimmed):
# >a Client source IP address
# <A Server IP address or peer name
# ts Seconds since epoch
# tu subsecond time (milliseconds)
# tr Response time (milliseconds)
# un User name
# Hs HTTP status code
# Ss Squid request status (TCP_MISS etc)
# rm Request method (GET/POST etc)
# ru Request URL
# st Request+Reply size including HTTP headers
I've tried a few things here, creating field extractions in Splunk was working OK until I got to the username field, as often the username is just "-" the regex creator in Splunk would not detect this. My regex knowledge is nowhere near enough to debug this. Some help would be greatly appreciated.
UPDATE
Attempting to use delimExtractions:
props.conf-
[squid]
REPORT-main=delimExtractions
SHOULD_LINEMERGE=false
TIME_FORMAT=%+ #log format time is in epoch. not sure if this is right
MAX_TIMESTAMP_LOOKAHEAD=19
KV_MODE = none
transforms.conf-
[delimExtractions]
DELIMS=" "
FIELDS="timestamp","responsetime","clientip","not_needed","zero","http_status","total_size","method","uri","username","server_ip
Fields such as 'responsetime', 'clientip' are not showing in the search tab, however 'not_needed','http_status' and a few others are.
I removed the other field extractions entry thinking I only needed the delimExtraction.
Sample squid logs:
1302571599.112 32 10.10.10.10 TCP_DENIED/407 0 407 2581 CONNECT armmf.adobe.com:443 - -
1302571599.112 465 10.10.10.10 TCP_MISS/200 0 200 13314 GET http://www.ebay.com.au/ username 203.5.76.11
1302571599.115 0 10.10.10.10 TCP_DENIED/407 0 407 2415 CONNECT armmf.adobe.com:443 - -
1302571599.115 17 10.10.10.10 TCP_IMS_HIT/304 0 304 1302 GET http://vtr.elections.nsw.gov.au/images/eGlooApp.gif username -
1302571599.118 195 10.10.10.10 TCP_MISS/200 0 200 1729 GET http://toolbarqueries.google.com.au/tbr? username 10.10.10.10
1302571599.119 19 10.10.10.10 TCP_NEGATIVE_HIT/404 0 404 2459 GET http://vtr.elections.nsw.gov.au/css/mysource_files/arrow.png username -
1302571599.119 796 10.10.10.10 TCP_MISS/200 0 200 1734 GET http://t.adcloud.net/t.gif? username 10.10.10.10
1302571599.122 148 10.10.10.10 TCP_MISS/200 0 200 5050 GET http://someurl.net username 10.10.10.10
1302571599.122 22 10.10.10.10 TCP_IMS_HIT/304 0 304 1321 GET http://vtr.elections.nsw.gov.au/images/panel-sprite.png username -
I'd really like to just change the squid log format back to default, but we have a few apps using this weird format for some reason... I mean really why need the '0' and have the status code twice 😕
You wont need any regex-fu, your logs will be space delimited and quote validated which is supported using the DELIMS=" " directive. (since values like user agent string have spaces)
anything "null" will be reported as "-" since a null value would break the format.
in .../-appdir-/local/props.conf
assuming your squid logs are sourcetyped as "squid"
[squid]
REPORT-main=delimExtractions
SHOULD_LINEMERGE=false
TIME_FORMAT=%Y-%m-%d %T #<--- you need to verify this and match it up with what you have
MAX_TIMESTAMP_LOOKAHEAD=19
KV_MODE = none
And in .../-appdir-/local/transforms.conf
[delimExtractions]
DELIMS=" "
FIELDS="date","time","field0","field1","field2" # <-- etc etc
I used to know all those squid fields by heart but not any longer since i'm heavy into elff now 😞 and I'm too lazy to dig up the doc and map the fields for you 😉
I can give you a definitive parser if you can post a snipet of your logs (obfuscated is fine)
You wont need any regex-fu, your logs will be space delimited and quote validated which is supported using the DELIMS=" " directive. (since values like user agent string have spaces)
anything "null" will be reported as "-" since a null value would break the format.
in .../-appdir-/local/props.conf
assuming your squid logs are sourcetyped as "squid"
[squid]
REPORT-main=delimExtractions
SHOULD_LINEMERGE=false
TIME_FORMAT=%Y-%m-%d %T #<--- you need to verify this and match it up with what you have
MAX_TIMESTAMP_LOOKAHEAD=19
KV_MODE = none
And in .../-appdir-/local/transforms.conf
[delimExtractions]
DELIMS=" "
FIELDS="date","time","field0","field1","field2" # <-- etc etc
I used to know all those squid fields by heart but not any longer since i'm heavy into elff now 😞 and I'm too lazy to dig up the doc and map the fields for you 😉
I can give you a definitive parser if you can post a snipet of your logs (obfuscated is fine)
Thanks for the help! I've done some fooling around but haven't managed to get the fields right. For some reason some of my fields are not showing up in the 'search' field in the SplunkforSquid app. I'll update the post with a sample log and transforms/props file.
I don't know that I can help you directly, but a site I use for interactive regex'ing is www.regexr.com
You can paste your log into it, and build your regex with trial and error. That should help you narrow down your problem.
Side note, are you using NTLM authentication? If your username is '-' then your hit is TCP_DENIED. There are two TCP_DENIEDs for every auth request because of the way NTLM works. I would just discard the TCP_DENIEDs, and save yourself significant index room!
Will you paste an actual excerpt of the log?
How would one match the second last 'column' of the log file - I can't find any reference on how to use regexes to distinguish using a space delimiter.