Getting Data In

squid log format slightly changed- would it cause my problem?

ericsteed
Engager

I am running squid 3.1 with an almost stock logformat (I modified it to show the fully qualified name of the IP address instead of the IP). Here is the logformat directive in my squid.conf file:

logformat squid %ts.%03tu %6tr %>A %Ss/%03>Hs %<st %rm %ru %[un %Sh/%<A %mt

note the %>A and $a and %<a

I previously had this working but I didn't like the IP addresses showing up in the dashboard and I wanted to assign "names" via entries in /etc/hosts on the squid server so they would show up in the dashboard with more meaningful tags. Now it's saying it can't find a single entry in my log even though I have over 100,000 of them! Where should I start looking?

0 Karma
1 Solution

ericsteed
Engager

Ha.. I answered my own question. Here's what I came up with:

This is to accommodate a slightly altered log format from squid when processing in the SplunkforSquid addon app for Splunk. Normally the client IP is an actual IP address. I told Squid to output in FQDN which forces it to do a lookup against /etc/hosts and substitute friendly names for the IP addresses. However, splunk is looking for a specific type of data in the 2nd field (client IP). Note that in the squid output, the client IP would be considered to be in the 3rd field from a space delimited perspective (see sample log entry for explanation) but based on the REGEX, it's actually the second field. It doesn't find any results with the original REGEX so I had to change it as outlined below:

Sample squid log output (original logformat out of the box):
1400639582.187 14 192.168.1.210 TCP_MISS/200 2497 GET 192.168.1.10:8000/en-US/splunkd/__raw/servicesNS/-/-/search/jobs? - DIRECT/192.168.1.10 application/json

sample squid log output (modified to be more human friendly):
1400639582.187 14 laptop TCP_MISS/200 2497 GET 192.168.1.10:8000/en-US/splunkd/__raw/servicesNS/-/-/search/jobs? - DIRECT/192.168.1.10 application/json

/opt/splunk/etc/apps/SplunkforSquid/default/transforms.conf Original REGEX: v
REGEX = ^\d+.\d+\s+(\d+)\s+([0-9.])\s+([^/]+)/(\d+)\s+(\d+)\s+(\w+)\s+((?:([^:])://)?([^/:]+):?(\d+)?(/?[^ ]))\s+(\S+)\s+([^/]+)/([^ ]+)\s+(.)$

New REGEX: v
REGEX = ^\d+.\d+\s+(\d+)\s+([^/])\s+([^/]+)/(\d+)\s+(\d+)\s+(\w+)\s+((?:([^:])://)?([^/:]+):?(\d+)?(/?[^ ]))\s+(\S+)\s+([^/]+)/([^ ]+)\s+(.)$

Field format identifiers:
FORMAT = duration::$1 clientip::$2 action::$3 http_status::$4 bytes::$5 method::$6 uri::$7 proto::$8 uri_host::$9 uri_port::$10 uri_path::$11 username::$12 hierarchy::$13 server_ip::$14 content_type::$15

I hope this helps some other newbs like myself. I've just started to use splunk so I'm still getting used to the structure.

View solution in original post

0 Karma

ericsteed
Engager

Ha.. I answered my own question. Here's what I came up with:

This is to accommodate a slightly altered log format from squid when processing in the SplunkforSquid addon app for Splunk. Normally the client IP is an actual IP address. I told Squid to output in FQDN which forces it to do a lookup against /etc/hosts and substitute friendly names for the IP addresses. However, splunk is looking for a specific type of data in the 2nd field (client IP). Note that in the squid output, the client IP would be considered to be in the 3rd field from a space delimited perspective (see sample log entry for explanation) but based on the REGEX, it's actually the second field. It doesn't find any results with the original REGEX so I had to change it as outlined below:

Sample squid log output (original logformat out of the box):
1400639582.187 14 192.168.1.210 TCP_MISS/200 2497 GET 192.168.1.10:8000/en-US/splunkd/__raw/servicesNS/-/-/search/jobs? - DIRECT/192.168.1.10 application/json

sample squid log output (modified to be more human friendly):
1400639582.187 14 laptop TCP_MISS/200 2497 GET 192.168.1.10:8000/en-US/splunkd/__raw/servicesNS/-/-/search/jobs? - DIRECT/192.168.1.10 application/json

/opt/splunk/etc/apps/SplunkforSquid/default/transforms.conf Original REGEX: v
REGEX = ^\d+.\d+\s+(\d+)\s+([0-9.])\s+([^/]+)/(\d+)\s+(\d+)\s+(\w+)\s+((?:([^:])://)?([^/:]+):?(\d+)?(/?[^ ]))\s+(\S+)\s+([^/]+)/([^ ]+)\s+(.)$

New REGEX: v
REGEX = ^\d+.\d+\s+(\d+)\s+([^/])\s+([^/]+)/(\d+)\s+(\d+)\s+(\w+)\s+((?:([^:])://)?([^/:]+):?(\d+)?(/?[^ ]))\s+(\S+)\s+([^/]+)/([^ ]+)\s+(.)$

Field format identifiers:
FORMAT = duration::$1 clientip::$2 action::$3 http_status::$4 bytes::$5 method::$6 uri::$7 proto::$8 uri_host::$9 uri_port::$10 uri_path::$11 username::$12 hierarchy::$13 server_ip::$14 content_type::$15

I hope this helps some other newbs like myself. I've just started to use splunk so I'm still getting used to the structure.

0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...