Getting Data In
Highlighted

Easy newbie question: How to make a modified version of Splunk's built-in knowledge about Apache access logs

New Member

I have a pile of Apache access logs where the format is just slightly modified from the default. Is there any way I can leverage Splunk's existing knowledge of the "apache-common" sourcetype to get more intelligent parsing of my slightly modified format?

Here's the original 'common' format definition:
LogFormat"%h %l %u %t \"%r\" %>s %b" common

...and here's the modified version:
LogFormat "%{X-Forwarded-For}i %h %D %l %u %t \"%r\" %>s %b" common

Basically we prepended the contents of the X-Forwarded-For header (a comma-and-space-separated list of IP addresses or "-") and then shifted around the other fields.

Clearly there's no way Splunk is going to automagically figure that out -- but I'm stumped on where to start with telling it about the new format.

So I am hoping there's some way in which I can look at what tells Splunk how to understand the default format, just as a starting point for building my new version.

Seems like this must be a basic newbie question -- any tips would be appreciated.

0 Karma
Highlighted

Re: Easy newbie question: How to make a modified version of Splunk's built-in knowledge about Apache access logs

New Member

This is what I found, I hope it helps. It is untested but should be functional.

Reference Document-> http://httpd.apache.org/docs/1.3/logs.html

I would personally change the search names to something smaller, but I altered it slightly to name value pairs. Here is the altered query string.

"xforwarder=%{X-Forwarded-For}i IP=%h userid=%u time=%t request="%r" responseCode=%>s responseSize=%b"

I removed %D and %l as they are undefined and filler respectively.

  • %l => -
  • %D=> not on the page
  • %h=> IP Address
  • \"%{Referer}i\" =>referrer I didn't use but could see value in adding
  • \"%{User-agent}i\"" => user-Agent
  • %r => The request line from the client is given in double quotes.
0 Karma