All Apps and Add-ons
Highlighted

Parsing syslog data for httpproxy data in Splunk For Squid

New Member

I have just set up Splunk and am trying to get my http proxy (Astaro) data into Splunk for Squid. Astaro does use squid but the syslog data isn't in the standard squid format. I can get the syslog data into Splunk and see it via a new UDP:514 input, but I'm having trouble with getting the data visible in Splunk For Squid.
Here is a typical syslogd entry:

Mar 31 20:26:00 10.10.40.10 2012:03:31-20:25:50 httpproxy[14016]: id="0001" severity="info" sys="SecureWeb" sub="http" name="http access" action="pass" method="GET" srcip="10.10.30.101" dstip="67.195.186.237" user="" statuscode="200" cached="0" profile="PROFILENAMEHERE" filteraction="DefaultHTTPAction" size="403" request="0x7cd39000" url="<URLISHERE>" exceptions="" error="" country="United States" category="122,157" reputation="neutral" categoryname="Instant Messaging,Web Phone" content-type="application/json"

Here are my props.conf contents.

[squid]
TIME_FORMAT = %s.%3N
MAX_TIMESTAMP_LOOKAHEAD = 15
KV_MODE = none
SHOULD_LINEMERGE = false
REPORT-squid = squid
[source::udp:514]
TRANSFORMS-sqsourcetype= sq_sourcetyper

and transforms.conf...

[squid]
REGEX =^d+.d+s+(d+)s+([0-9.])s+([^/]+)/(d+)s+(d+)s+(w+)s+((?:([^:])://)?([^/:]+):?(d+)?(/?[^]))s+(S+)s+([^/]+)/([^ ]+)s+(.)$
FORMAT = id::$1 severity::$2 sys::$3 sub::$4 name::$5 action::$6 method::$7 srcip::$8 dstip::$9 user::$10 statuscode::$11 cached::$12 profile::$13 filteraction::$14 size::$15 request::$16 url::$17 exceptions::$18 error::$19 country::$20 category::$21 reputation::$22 categoryname::$23 content-type::$24

[sq_sourcetyper] 
SOURCE_KEY = MetaData:Host 
REGEX = httpproxy
DEST_KEY = MetaData:Sourcetype 
FORMAT= sourcetype::squid

When adding a data source I can't see the "squid" sourcetype anywhere.

I'm guessing that my transforms.conf REGEX is wrong, but how do I get the data to show up in Splunk For Squid?
Markdown may have messed up the formatting.

0 Karma
Highlighted

Re: Parsing syslog data for httpproxy data in Splunk For Squid

Splunk Employee
Splunk Employee

yes. almost certainly, your host (or MetaData:Host) value is not httpproxy, but instead 10.10.40.10. Unfortunately, this kind of chaining of timestamps and hostnames is an inherent problem with using syslog, which doesn't specify the host in the data itself. You can try putting that in there. If that's undesirable, you can try instead:

[sq_sourcetyper] 
REGEX = ^(?:\S+\s+){5}httpproxy
DEST_KEY = MetaData:Sourcetype 
FORMAT= sourcetype::squid

which instead looks for the matching host in the data.


Also, your squid rule is unnecessarily complicated and inflexible. Instead, use this in props.conf:

[squid]
TIME_FORMAT = %s.%3N
MAX_TIMESTAMP_LOOKAHEAD = 15
KV_MODE = auto
SHOULD_LINEMERGE = false

The REGEX is slower and more complicated, so instead of using that, the auto KV_MODE extracts name value pairs anyway. If that doesn't work for you for some reason, you could try keeping your original props.conf, but changing the transforms.conf to:

[squid]
DELIMS = " ", "="

but it should work with the simpler config.

Another thing to consider option would be to modify your Splunk input config:

[udp:514]
no_appending_timestamp = true

which will prevent Splunk from adding the extra timestamp and host to the data. If you do this, you should modify your raw matching regex to ^\S+\s+httpproxy, since you don't need to match on the extra components.