Splunk Search

How to configure props.conf and transforms.conf for field extractions to occur at index-time, not search-time?

redc
Builder

We use a custom format for our Apache access logs. Long ago, I put together a regex to extract the fields from the custom format. At that time, I set it up as a field extraction on the indexer.

The problem is, that field extraction is applied at search time. That means if I have to search more than a couple hours' worth of Apache access logs, I start getting complaints from the search head about how the extraction is taking an excessively long time, which results in it taking forever to search the Apache access logs.

For performance reasons on the search head, I want to have it extract the fields at index time, not at search time.

I tried setting up transforms.conf and props.conf on the forwarder (which is where I thought it SHOULD go for this), but it never seemed to get applied to the data. I saw in this thread that when you're using the Universal Forwarder (which I am), you need to configure the transforms and props on the indexer. However, after doing so, I find that the extraction is being applied at search time, not at index time.

How do I get it to extract the fields at index time?

props.conf content:

[Apache_access]
REPORT-Apache_access = Apache_access_log

transforms.conf content:

[Apache_access_log]
CLEAN_KEYS = 0
REGEX = (?<site_client_dir>\S+?) (?<remote_host>\S+?) (?<remote_logname>\S+?) (?<remote_user>\S+?) [(?<request_start_time>\S+?) (?<request_time_offset>\S+?)] \"(?<request_method>.*?) (?<request>.*?) (?<request_http_version>.*?)\" (?<response_status>\S+?) (?<response_size>\S+?) \"(?<referrer>.*?)\" \"(?<user_agent>.*?)\" (?<cookie>\S+?) (?<response_time>\S+[\r\n]?)

0 Karma
1 Solution

hortonew
Builder

Did you go through the following document? I think you might be missing a few things like a fields.conf, and some more info in your props.conf pointing to your transform. I've never configured this, but I would read through the following document to see if anything jumps out at you.

http://docs.splunk.com/Documentation/Splunk/6.1.3/Data/Configureindex-timefieldextraction

View solution in original post

hortonew
Builder

Did you go through the following document? I think you might be missing a few things like a fields.conf, and some more info in your props.conf pointing to your transform. I've never configured this, but I would read through the following document to see if anything jumps out at you.

http://docs.splunk.com/Documentation/Splunk/6.1.3/Data/Configureindex-timefieldextraction

hortonew
Builder

How's your scripting? I bet you could quickly set up a script that sends syslog to splunk in the form that apache does. You could just hard code a string and spam it at the server.

0 Karma

redc
Builder

Ah, I see. Yes, that would be what I'm missing.

Having read some more "help" threads on Answers and reviewed additional similar documentation, I've concluded that maybe this isn't the way to go. My test environment, unfortunately, doesn't get enough data to be able to test the performance enhancements I might get, so I'll have to do it in the live system; I'll try this out in that environment in a more controlled fashion at a later date.

Thanks for the help!

0 Karma
Get Updates on the Splunk Community!

Enterprise Security Content Update (ESCU) | New Releases

In November, the Splunk Threat Research Team had one release of new security content via the Enterprise ...

Index This | Divide 100 by half. What do you get?

November 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with this ...

Stay Connected: Your Guide to December Tech Talks, Office Hours, and Webinars!

❄️ Celebrate the season with our December lineup of Community Office Hours, Tech Talks, and Webinars! ...