Ha.. I answered my own question. Here's what I came up with:
This is to accommodate a slightly altered log format from squid when processing in the SplunkforSquid addon app for Splunk. Normally the client IP is an actual IP address. I told Squid to output in FQDN which forces it to do a lookup against /etc/hosts and substitute friendly names for the IP addresses. However, splunk is looking for a specific type of data in the 2nd field (client IP). Note that in the squid output, the client IP would be considered to be in the 3rd field from a space delimited perspective (see sample log entry for explanation) but based on the REGEX, it's actually the second field. It doesn't find any results with the original REGEX so I had to change it as outlined below:
Sample squid log output (original logformat out of the box):
1400639582.187 14 192.168.1.210 TCP_MISS/200 2497 GET 192.168.1.10:8000/en-US/splunkd/__raw/servicesNS/-/-/search/jobs? - DIRECT/192.168.1.10 application/json
sample squid log output (modified to be more human friendly):
1400639582.187 14 laptop TCP_MISS/200 2497 GET 192.168.1.10:8000/en-US/splunkd/__raw/servicesNS/-/-/search/jobs? - DIRECT/192.168.1.10 application/json
/opt/splunk/etc/apps/SplunkforSquid/default/transforms.conf Original REGEX: v
REGEX = ^\d+.\d+\s+(\d+)\s+([0-9.])\s+([^/]+)/(\d+)\s+(\d+)\s+(\w+)\s+((?:([^:])://)?([^/:]+):?(\d+)?(/?[^ ]))\s+(\S+)\s+([^/]+)/([^ ]+)\s+(.)$
New REGEX: v
REGEX = ^\d+.\d+\s+(\d+)\s+([^/])\s+([^/]+)/(\d+)\s+(\d+)\s+(\w+)\s+((?:([^:])://)?([^/:]+):?(\d+)?(/?[^ ]))\s+(\S+)\s+([^/]+)/([^ ]+)\s+(.)$
Field format identifiers:
FORMAT = duration::$1 clientip::$2 action::$3 http_status::$4 bytes::$5 method::$6 uri::$7 proto::$8 uri_host::$9 uri_port::$10 uri_path::$11 username::$12 hierarchy::$13 server_ip::$14 content_type::$15
I hope this helps some other newbs like myself. I've just started to use splunk so I'm still getting used to the structure.
... View more