We're having problem with field extraction with this TA. We have not changed the format of squid's logs - it is all default.
We're running squid 3.5.20-12 on RHEL 7 which was compiled in October 2017. The TA was last updated in late 2015. What are the chances that the log format has changed but the TA hasn't been updated to match? Can someone please confirm that this TA works properly with a modern version of Squid?
Our problems include:
1. The time since epoch field being reported as a number ( not correctly as the time/date )
2. The User Agent field is not extracted properly
3. and any fields after User Agent are just appended to the user Agent field ( This includes /r/n characters )
Thanks!
So it turns out this was likely our problem. We actually did change the format of squid's logs, by enabling two directives for extra log detail:
1. strip_query_terms on (this is a business-use only network)
2. log_mime_hdrs on (we want detail)
Apparently the time since epoch field should be left as a part of the event and we suspect Splunk is correctly using it for the source time.
With respect to item 1, this is standard behaviour. Splunk doesn't change the timestamp representation in the _raw event so the timestamp appears simply as the number of seconds since the epoch. The _time field should however be in a human readable format.
With respect to your items 2 & 3, I suspect that the Squid Proxy in question has the log_mime_hdrs directive enabled. Setting "log_mime_hdrs on" causes the request and response MIME headers to be appended to the access_log entries and the TA's field extraction will erroneously put all of this data in to http_content_type field. This, IMHO, is an error in the "current" Add-On - the regex that pulls out the http_content_type should be limited to either the end of the line, or the first space, and it does not allow for the existence of the optional MIME header fields.
So it turns out this was likely our problem. We actually did change the format of squid's logs, by enabling two directives for extra log detail:
1. strip_query_terms on (this is a business-use only network)
2. log_mime_hdrs on (we want detail)
Apparently the time since epoch field should be left as a part of the event and we suspect Splunk is correctly using it for the source time.