I'm looking to forward only the first portion of a log file event to the indexer to be indexed. The remaining data I cannot send due to possible PCI reasons. I have installed the full splunk instance to use the heavy forwarder and have tried updating props.conf with an EXTRACT statement to pull out what I want forwarded. Portion below:
[myindex]
EXTRACT-myindexEXT = (?<A>\d+\:\d+\.\d+\:\d+\.\d+) (?<B>\d+) (?<C>\w+)
But the full logged message is getting sent to the indexer. Is there a way to do this?
You have misunderstood what the EXTRACT will do. That is for creating fields at search time. It has nothing to do with masking data, or limit the amount of each message being indexed.
According to this document ,if you have a heavy forwarder, you can use the SEDCMD function to mask out data, since the parsing phase takes place on a heavy forwarder.
SEDCMD-<name> = <sed script>
* Only used at index time.
* Commonly used to anonymize incoming data at index time, such as credit card or social
security numbers. For more information, search the online documentation for "anonymize
data."
TRUNCATE =
n (also in props.conf) may be an option.
For more info on anonymization, please see;
http://docs.splunk.com/Documentation/Splunk/5.0.2/Data/Anonymizedatausingconfigurationfiles
Hope this helps,
K