All Apps and Add-ons

Parsing syslog data for httpproxy data in Splunk For Squid

jminihane
New Member

I have just set up Splunk and am trying to get my http proxy (Astaro) data into Splunk for Squid. Astaro does use squid but the syslog data isn't in the standard squid format. I can get the syslog data into Splunk and see it via a new UDP:514 input, but I'm having trouble with getting the data visible in Splunk For Squid.
Here is a typical syslogd entry:

Mar 31 20:26:00 10.10.40.10 2012:03:31-20:25:50 httpproxy[14016]: id="0001" severity="info" sys="SecureWeb" sub="http" name="http access" action="pass" method="GET" srcip="10.10.30.101" dstip="67.195.186.237" user="" statuscode="200" cached="0" profile="PROFILENAMEHERE" filteraction="DefaultHTTPAction" size="403" request="0x7cd39000" url="<URLISHERE>" exceptions="" error="" country="United States" category="122,157" reputation="neutral" categoryname="Instant Messaging,Web Phone" content-type="application/json"

Here are my props.conf contents.

[squid]
TIME_FORMAT = %s.%3N
MAX_TIMESTAMP_LOOKAHEAD = 15
KV_MODE = none
SHOULD_LINEMERGE = false
REPORT-squid = squid
[source::udp:514]
TRANSFORMS-sqsourcetype= sq_sourcetyper

and transforms.conf...

[squid]
REGEX =^d+.d+s+(d+)s+([0-9.])s+([^/]+)/(d+)s+(d+)s+(w+)s+((?:([^:])://)?([^/:]+):?(d+)?(/?[^]))s+(S+)s+([^/]+)/([^ ]+)s+(.)$
FORMAT = id::$1 severity::$2 sys::$3 sub::$4 name::$5 action::$6 method::$7 srcip::$8 dstip::$9 user::$10 statuscode::$11 cached::$12 profile::$13 filteraction::$14 size::$15 request::$16 url::$17 exceptions::$18 error::$19 country::$20 category::$21 reputation::$22 categoryname::$23 content-type::$24

[sq_sourcetyper] 
SOURCE_KEY = MetaData:Host 
REGEX = httpproxy
DEST_KEY = MetaData:Sourcetype 
FORMAT= sourcetype::squid

When adding a data source I can't see the "squid" sourcetype anywhere.

I'm guessing that my transforms.conf REGEX is wrong, but how do I get the data to show up in Splunk For Squid?
Markdown may have messed up the formatting.

0 Karma

gkanapathy
Splunk Employee
Splunk Employee

yes. almost certainly, your host (or MetaData:Host) value is not httpproxy, but instead 10.10.40.10. Unfortunately, this kind of chaining of timestamps and hostnames is an inherent problem with using syslog, which doesn't specify the host in the data itself. You can try putting that in there. If that's undesirable, you can try instead:

[sq_sourcetyper] 
REGEX = ^(?:\S+\s+){5}httpproxy
DEST_KEY = MetaData:Sourcetype 
FORMAT= sourcetype::squid

which instead looks for the matching host in the data.


Also, your squid rule is unnecessarily complicated and inflexible. Instead, use this in props.conf:

[squid]
TIME_FORMAT = %s.%3N
MAX_TIMESTAMP_LOOKAHEAD = 15
KV_MODE = auto
SHOULD_LINEMERGE = false

The REGEX is slower and more complicated, so instead of using that, the auto KV_MODE extracts name value pairs anyway. If that doesn't work for you for some reason, you could try keeping your original props.conf, but changing the transforms.conf to:

[squid]
DELIMS = " ", "="

but it should work with the simpler config.

Another thing to consider option would be to modify your Splunk input config:

[udp:514]
no_appending_timestamp = true

which will prevent Splunk from adding the extra timestamp and host to the data. If you do this, you should modify your raw matching regex to ^\S+\s+httpproxy, since you don't need to match on the extra components.

Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...