We use a custom format for our Apache access logs. Long ago, I put together a regex to extract the fields from the custom format. At that time, I set it up as a field extraction on the indexer.
The problem is, that field extraction is applied at search time. That means if I have to search more than a couple hours' worth of Apache access logs, I start getting complaints from the search head about how the extraction is taking an excessively long time, which results in it taking forever to search the Apache access logs.
For performance reasons on the search head, I want to have it extract the fields at index time, not at search time.
I tried setting up transforms.conf and props.conf on the forwarder (which is where I thought it SHOULD go for this), but it never seemed to get applied to the data. I saw in this thread that when you're using the Universal Forwarder (which I am), you need to configure the transforms and props on the indexer. However, after doing so, I find that the extraction is being applied at search time, not at index time.
How do I get it to extract the fields at index time?
props.conf content:
[Apache_access]
REPORT-Apache_access = Apache_access_log
transforms.conf content:
[Apache_access_log]
CLEAN_KEYS = 0
REGEX = (?<site_client_dir>\S+?) (?<remote_host>\S+?) (?<remote_logname>\S+?) (?<remote_user>\S+?) [(?<request_start_time>\S+?) (?<request_time_offset>\S+?)] \"(?<request_method>.*?) (?<request>.*?) (?<request_http_version>.*?)\" (?<response_status>\S+?) (?<response_size>\S+?) \"(?<referrer>.*?)\" \"(?<user_agent>.*?)\" (?<cookie>\S+?) (?<response_time>\S+[\r\n]?)
 
					
				
		
Did you go through the following document? I think you might be missing a few things like a fields.conf, and some more info in your props.conf pointing to your transform. I've never configured this, but I would read through the following document to see if anything jumps out at you.
http://docs.splunk.com/Documentation/Splunk/6.1.3/Data/Configureindex-timefieldextraction
 
					
				
		
Did you go through the following document? I think you might be missing a few things like a fields.conf, and some more info in your props.conf pointing to your transform. I've never configured this, but I would read through the following document to see if anything jumps out at you.
http://docs.splunk.com/Documentation/Splunk/6.1.3/Data/Configureindex-timefieldextraction
 
					
				
		
How's your scripting? I bet you could quickly set up a script that sends syslog to splunk in the form that apache does. You could just hard code a string and spam it at the server.
Ah, I see. Yes, that would be what I'm missing.
Having read some more "help" threads on Answers and reviewed additional similar documentation, I've concluded that maybe this isn't the way to go. My test environment, unfortunately, doesn't get enough data to be able to test the performance enhancements I might get, so I'll have to do it in the live system; I'll try this out in that environment in a more controlled fashion at a later date.
Thanks for the help!
