Getting Data In

Why does Splunk not recognize standard fields in my Apache data forwarded by syslog?

stefanlasiewski
Contributor

I have over 100 Apache webservers which forward their logs to a syslog-ng server, which then forwards the data a TCP data input on Splunk, as well as forwarding the data to other non-Splunk log-analysis servers.

In Splunk Search, the data looks like this:

Dec 16 10:29:59 192.168.99.100 httpd[10583]: site1.example.org 10.4.5.6 - - [16/Dec/2014:10:29:59 -0800] "GET /rest/somepath/12345" HTTP/1.1" 200 105066 "-" "-"
Dec 16 10:29:59 192.168.99.101 httpd[22404]: site2.example.org 4.4.12.15 - someuser [16/Dec/2014:10:29:59 -0800] "GET /wiki/javascript/foo.js" HTTP/1.1" 304 - "https://site2.example.org/wiki/somepage.html" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.71 Safari/537.36"
Dec 16 10:29:59 192.168.6.100 httpd[6380]: site3.example.org 172.16.43.41 - - [16/Dec/2014:10:29:59 -0800] "GET /project/projectA/somescript.cgi?username=spiderman" 200 9048 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"

However, Splunk recognizes only a few default fields in this data. It recognizes the host, process, source, sourcetype, data_hour, etc. It does not recognize Apache-specific fields like clientip status, method, etc. which are mentioned in the Splunk tutorial. It doesn't even recognize string like 4.4.12.15 as an IP address.

As a result, I need to create a whole bunch of custom field extractions in order to do many useful tasks in Splunk.

Why does Splunk not recognize fields in my Apache data? How can I transform the data so that Splunk will recognize the data correctly?

Second question: Would it help if I used a Splunk Forwarder on our syslog server instead of using TCP for data input?

Tags (3)

neelamssantosh
Contributor

KV_MODE attribute to specify the field/value extraction mode for your data in props.conf

auto: Extracts field/value pairs and separates them with equal signs. This is the default field extraction behavior if you do not include this attribute in your field extraction stanza.

KV_MODE = auto

Hope it can work

chanfoli
Builder

The original question has nothing to do with key/value equal-sign extractions.

0 Karma

stefanlasiewski
Contributor

I'm not sure what KV_MODE has to do with my problem. Can you explain?

0 Karma

neelamssantosh
Contributor

It automatically extracts the fields. In your case, clientip and status can be extracted by splunk intelligence. Which can be seen in interesting fields.mostly on indexer/seachheads to avoid the load it is kept to None. Kindly check with this option.

0 Karma

stefanlasiewski
Contributor

However, my data does not normally use key=value pairs, nor is it XML or JSON based, and KV_MODE=auto is already the default. My log data is standard, Unix-type syslog data.

0 Karma

chanfoli
Builder

The first parts of each line in these events look like syslog data so this data is likely getting seen as a syslog sourcetype. The client IP field above is where it actually starts looking like combined apache access data. Events consisting of mish-mosh of two different sourcetypes is obviously not going to work with the built-ins so you either need to remove the part of the events that are not part of the pretrained apache access log sourcetype before input or implement a transform that trims all that syslog stuff before the clientip. Another way would be to customize either of the two extraction transforms to perhaps use bits from the other at which point you will have created your own syslog-httpd-access sourcetype. I wanted to share a little background on why it is not working but instead of doing all the work yourself, you might want to look at this:

http://wiki.splunk.com/Community:StripSyslog

stefanlasiewski
Contributor

Thanks for the help. I made progress, but I'm still not there yet.

I used transforms.conf and props.conf as described on that page to transform data from the old format:

Dec 16 10:29:59 192.168.99.100 httpd[10583]: site1.example.org 10.4.5.6 - - [16/Dec/2014:10:29:59 -0800] "GET /rest/somepath/12345" HTTP/1.1" 200 105066 "-" "-"

To the new format:

10.4.5.6 - - [16/Dec/2014:10:29:59 -0800] "GET /rest/somepath/12345" HTTP/1.1" 200 105066 "-" "-"

Splunk still doesn't recognize any of the Apache-specific fields such as clientip or status. Any ideas?

0 Karma

chanfoli
Builder

What sourcetype is the data getting indexed as. The sourcetype on this input might be set to something other than access_common. IIRC splunk determines pretrained sourcetypes based on some of the first data in the input. So you may need to set the sourcetype of the input to access_common.

0 Karma

stefanlasiewski
Contributor

The sourcetype is still set to syslog. I'm not sure if or how I can change that.

0 Karma

chanfoli
Builder

try adding the reference to the correct extraction to the syslog sourcetype, if you have other types of data coming in as syslog, it might be impacted. The correct way to address this either requires breaking out different sourcetypes from your syslog data or doing something more advanced using an event based override as described here:
http://docs.splunk.com/Documentation/Splunk/6.2.1/Data/Advancedsourcetypeoverrides

If you have only apache data here you may be able to add this to the syslog sourcetype stanza in props.conf and have it work, but this may break not properly transform other events:
REPORT-access = access-extractions

This is what actually tells it what extraction definition to use.

0 Karma

stefanlasiewski
Contributor

@chanfoli, do you think this would be better if I put a Splunk Forwarder on my syslog server instead? I imagine that this way, the data won't automatically get tagged with the syslog sourcetype and the fields might get extracted correctly. I would probably need to strip the Syslog header on the Splunk Forwarder, but I am not sure if that is possible.

0 Karma

stefanlasiewski
Contributor

Thanks. These syslogs contain data from thousands of systems and contain more than just Apache log data. I'll take a look at your suggestion.

0 Karma
Get Updates on the Splunk Community!

New This Month in Splunk Observability Cloud - Metrics Usage Analytics, Enhanced K8s ...

The latest enhancements across the Splunk Observability portfolio deliver greater flexibility, better data and ...

Alerting Best Practices: How to Create Good Detectors

At their best, detectors and the alerts they trigger notify teams when applications aren’t performing as ...

Discover Powerful New Features in Splunk Cloud Platform: Enhanced Analytics, ...

Hey Splunky people! We are excited to share the latest updates in Splunk Cloud Platform 9.3.2408. In this ...