Splunk Search

Custom Squid log format

anstoitsec
Explorer

Hi all,

I'm trying to modify the SplunkforSquid app to read my squid custom log file format correctly. As per squid.conf it is-

logformat test %ts.%03tu %6tr %>a %Ss/%03Hs 0 %03Hs %st %rm %ru %un %<A

Log format codes (trimmed):

#               >a      Client source IP address
#               <A      Server IP address or peer name
#               ts      Seconds since epoch
#               tu      subsecond time (milliseconds)
#               tr      Response time (milliseconds)
#               un      User name
#               Hs      HTTP status code
#               Ss      Squid request status (TCP_MISS etc)
#               rm      Request method (GET/POST etc)
#               ru      Request URL
#               st      Request+Reply size including HTTP headers

I've tried a few things here, creating field extractions in Splunk was working OK until I got to the username field, as often the username is just "-" the regex creator in Splunk would not detect this. My regex knowledge is nowhere near enough to debug this. Some help would be greatly appreciated.

UPDATE

Attempting to use delimExtractions:

props.conf-

[squid]
REPORT-main=delimExtractions
SHOULD_LINEMERGE=false
TIME_FORMAT=%+                  #log format time is in epoch. not sure if this is right
MAX_TIMESTAMP_LOOKAHEAD=19
KV_MODE = none

transforms.conf-

[delimExtractions]
DELIMS=" "
FIELDS="timestamp","responsetime","clientip","not_needed","zero","http_status","total_size","method","uri","username","server_ip

Fields such as 'responsetime', 'clientip' are not showing in the search tab, however 'not_needed','http_status' and a few others are.

I removed the other field extractions entry thinking I only needed the delimExtraction.

Sample squid logs:

1302571599.112     32 10.10.10.10 TCP_DENIED/407 0 407 2581 CONNECT armmf.adobe.com:443 - -

1302571599.112    465 10.10.10.10 TCP_MISS/200 0 200 13314 GET http://www.ebay.com.au/ username 203.5.76.11

1302571599.115      0 10.10.10.10 TCP_DENIED/407 0 407 2415 CONNECT armmf.adobe.com:443 - -

1302571599.115     17 10.10.10.10 TCP_IMS_HIT/304 0 304 1302 GET http://vtr.elections.nsw.gov.au/images/eGlooApp.gif username -

1302571599.118    195 10.10.10.10 TCP_MISS/200 0 200 1729 GET http://toolbarqueries.google.com.au/tbr? username 10.10.10.10

1302571599.119     19 10.10.10.10 TCP_NEGATIVE_HIT/404 0 404 2459 GET http://vtr.elections.nsw.gov.au/css/mysource_files/arrow.png username -

1302571599.119    796 10.10.10.10 TCP_MISS/200 0 200 1734 GET http://t.adcloud.net/t.gif? username 10.10.10.10

1302571599.122    148 10.10.10.10 TCP_MISS/200 0 200 5050 GET http://someurl.net username 10.10.10.10

1302571599.122     22 10.10.10.10 TCP_IMS_HIT/304 0 304 1321 GET http://vtr.elections.nsw.gov.au/images/panel-sprite.png username -

I'd really like to just change the squid log format back to default, but we have a few apps using this weird format for some reason... I mean really why need the '0' and have the status code twice 😕

Tags (1)
1 Solution

rshoward
Path Finder

You wont need any regex-fu, your logs will be space delimited and quote validated which is supported using the DELIMS=" " directive. (since values like user agent string have spaces)

anything "null" will be reported as "-" since a null value would break the format.

in .../-appdir-/local/props.conf

assuming your squid logs are sourcetyped as "squid"

[squid]
REPORT-main=delimExtractions
SHOULD_LINEMERGE=false
TIME_FORMAT=%Y-%m-%d %T   #<--- you need to verify this and match it up with what you have
MAX_TIMESTAMP_LOOKAHEAD=19
KV_MODE = none

And in .../-appdir-/local/transforms.conf

[delimExtractions]
DELIMS=" "
FIELDS="date","time","field0","field1","field2" # <-- etc etc 

I used to know all those squid fields by heart but not any longer since i'm heavy into elff now 😞 and I'm too lazy to dig up the doc and map the fields for you 😉

I can give you a definitive parser if you can post a snipet of your logs (obfuscated is fine)

View solution in original post

0 Karma

rshoward
Path Finder

You wont need any regex-fu, your logs will be space delimited and quote validated which is supported using the DELIMS=" " directive. (since values like user agent string have spaces)

anything "null" will be reported as "-" since a null value would break the format.

in .../-appdir-/local/props.conf

assuming your squid logs are sourcetyped as "squid"

[squid]
REPORT-main=delimExtractions
SHOULD_LINEMERGE=false
TIME_FORMAT=%Y-%m-%d %T   #<--- you need to verify this and match it up with what you have
MAX_TIMESTAMP_LOOKAHEAD=19
KV_MODE = none

And in .../-appdir-/local/transforms.conf

[delimExtractions]
DELIMS=" "
FIELDS="date","time","field0","field1","field2" # <-- etc etc 

I used to know all those squid fields by heart but not any longer since i'm heavy into elff now 😞 and I'm too lazy to dig up the doc and map the fields for you 😉

I can give you a definitive parser if you can post a snipet of your logs (obfuscated is fine)

0 Karma

anstoitsec
Explorer

Thanks for the help! I've done some fooling around but haven't managed to get the fields right. For some reason some of my fields are not showing up in the 'search' field in the SplunkforSquid app. I'll update the post with a sample log and transforms/props file.

0 Karma

jgauthier
Contributor

I don't know that I can help you directly, but a site I use for interactive regex'ing is www.regexr.com

You can paste your log into it, and build your regex with trial and error. That should help you narrow down your problem.

0 Karma

jgauthier
Contributor

Side note, are you using NTLM authentication? If your username is '-' then your hit is TCP_DENIED. There are two TCP_DENIEDs for every auth request because of the way NTLM works. I would just discard the TCP_DENIEDs, and save yourself significant index room!

Will you paste an actual excerpt of the log?

0 Karma

anstoitsec
Explorer

How would one match the second last 'column' of the log file - I can't find any reference on how to use regexes to distinguish using a space delimiter.

0 Karma
Get Updates on the Splunk Community!

Federated Search for Amazon S3 | Key Use Cases to Streamline Compliance Workflows

Modern business operations are supported by data compliance. As regulations evolve, organizations must ...

New Dates, New City: Save the Date for .conf25!

Wake up, babe! New .conf25 dates AND location just dropped!! That's right, this year, .conf25 is taking place ...

Introduction to Splunk Observability Cloud - Building a Resilient Hybrid Cloud

Introduction to Splunk Observability Cloud - Building a Resilient Hybrid Cloud  In today’s fast-paced digital ...