I have over 100 Apache webservers which forward their logs to a syslog-ng server, which then forwards the data a TCP data input on Splunk, as well as forwarding the data to other non-Splunk log-analysis servers.
In Splunk Search, the data looks like this:
Dec 16 10:29:59 192.168.99.100 httpd[10583]: site1.example.org 10.4.5.6 - - [16/Dec/2014:10:29:59 -0800] "GET /rest/somepath/12345" HTTP/1.1" 200 105066 "-" "-"
Dec 16 10:29:59 192.168.99.101 httpd[22404]: site2.example.org 4.4.12.15 - someuser [16/Dec/2014:10:29:59 -0800] "GET /wiki/javascript/foo.js" HTTP/1.1" 304 - "https://site2.example.org/wiki/somepage.html" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.71 Safari/537.36"
Dec 16 10:29:59 192.168.6.100 httpd[6380]: site3.example.org 172.16.43.41 - - [16/Dec/2014:10:29:59 -0800] "GET /project/projectA/somescript.cgi?username=spiderman" 200 9048 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
However, Splunk recognizes only a few default fields in this data. It recognizes the host , process , source , sourcetype , data_hour , etc. It does not recognize Apache-specific fields like clientip status , method , etc. which are mentioned in the Splunk tutorial. It doesn't even recognize string like 4.4.12.15 as an IP address.
As a result, I need to create a whole bunch of custom field extractions in order to do many useful tasks in Splunk.
Why does Splunk not recognize fields in my Apache data? How can I transform the data so that Splunk will recognize the data correctly?
Second question: Would it help if I used a Splunk Forwarder on our syslog server instead of using TCP for data input?
... View more