I have just set up Splunk and am trying to get my http proxy (Astaro) data into Splunk for Squid. Astaro does use squid but the syslog data isn't in the standard squid format. I can get the syslog data into Splunk and see it via a new UDP:514 input, but I'm having trouble with getting the data visible in Splunk For Squid.
Here is a typical syslogd entry:
Mar 31 20:26:00 10.10.40.10 2012:03:31-20:25:50 httpproxy[14016]: id="0001" severity="info" sys="SecureWeb" sub="http" name="http access" action="pass" method="GET" srcip="10.10.30.101" dstip="67.195.186.237" user="" statuscode="200" cached="0" profile="PROFILENAMEHERE" filteraction="DefaultHTTPAction" size="403" request="0x7cd39000" url="<URLISHERE>" exceptions="" error="" country="United States" category="122,157" reputation="neutral" categoryname="Instant Messaging,Web Phone" content-type="application/json"
Here are my props.conf contents.
[squid]
TIME_FORMAT = %s.%3N
MAX_TIMESTAMP_LOOKAHEAD = 15
KV_MODE = none
SHOULD_LINEMERGE = false
REPORT-squid = squid
[source::udp:514]
TRANSFORMS-sqsourcetype= sq_sourcetyper
and transforms.conf...
[squid]
REGEX =^d+.d+s+(d+)s+([0-9.])s+([^/]+)/(d+)s+(d+)s+(w+)s+((?:([^:])://)?([^/:]+):?(d+)?(/?[^]))s+(S+)s+([^/]+)/([^ ]+)s+(.)$
FORMAT = id::$1 severity::$2 sys::$3 sub::$4 name::$5 action::$6 method::$7 srcip::$8 dstip::$9 user::$10 statuscode::$11 cached::$12 profile::$13 filteraction::$14 size::$15 request::$16 url::$17 exceptions::$18 error::$19 country::$20 category::$21 reputation::$22 categoryname::$23 content-type::$24
[sq_sourcetyper]
SOURCE_KEY = MetaData:Host
REGEX = httpproxy
DEST_KEY = MetaData:Sourcetype
FORMAT= sourcetype::squid
When adding a data source I can't see the "squid" sourcetype anywhere.
I'm guessing that my transforms.conf REGEX is wrong, but how do I get the data to show up in Splunk For Squid?
Markdown may have messed up the formatting.
yes. almost certainly, your host
(or MetaData:Host
) value is not httpproxy
, but instead 10.10.40.10
. Unfortunately, this kind of chaining of timestamps and hostnames is an inherent problem with using syslog, which doesn't specify the host in the data itself. You can try putting that in there. If that's undesirable, you can try instead:
[sq_sourcetyper]
REGEX = ^(?:\S+\s+){5}httpproxy
DEST_KEY = MetaData:Sourcetype
FORMAT= sourcetype::squid
which instead looks for the matching host in the data.
Also, your squid rule is unnecessarily complicated and inflexible. Instead, use this in props.conf:
[squid]
TIME_FORMAT = %s.%3N
MAX_TIMESTAMP_LOOKAHEAD = 15
KV_MODE = auto
SHOULD_LINEMERGE = false
The REGEX is slower and more complicated, so instead of using that, the auto KV_MODE extracts name value pairs anyway. If that doesn't work for you for some reason, you could try keeping your original props.conf, but changing the transforms.conf to:
[squid]
DELIMS = " ", "="
Another thing to consider option would be to modify your Splunk input config:
[udp:514]
no_appending_timestamp = true
which will prevent Splunk from adding the extra timestamp and host to the data. If you do this, you should modify your raw matching regex to ^\S+\s+httpproxy
, since you don't need to match on the extra components.