Getting Data In

Ignoring massive amounts of data at index time

msarro
Builder

Hey everyone. We are working on taking in large amounts of CSV data. Each line of the CSV is a single event, and each line is comprised of about 270 fields. Currently only about 40 of those fields are useful. Right now our props.conf and transforms.conf are set to index each field with a specific field name.

How would we best proceed to strip out the fields that we don't need so they don't get indexed? It would be a substantial cost savings for us. I'd prefer not to make the worlds nastiest REGEX but if I have to I will.

Tags (1)
0 Karma

dwaddle
SplunkTrust
SplunkTrust

If you are looking to strip "columns" out of the CSV data at index time, about the only way you'd be able to do it is with a SEDCMD. I gamble this would be a nontrivial regex to write.

Maybe you could use a scripted input to read the CSV file, and feed it through (say) Python's csv module to only emit those fields of interest? I think the biggest issue here would be keeping up with how much of the CSV file you have previously read/transformed/sent to splunk. This could be easy, or could require you to re-implement much of the tailing processor's functionality around file rotations and such.

0 Karma

msarro
Builder

Hm, so it does look like it will be a massive regex. What sort of overhead would a regex of that size incur? Sadly one issue we'd run into with this is the fact that our source moves multiple times.

0 Karma
Get Updates on the Splunk Community!

[Puzzles] Solve, Learn, Repeat: Dynamic formatting from XML events

This challenge was first posted on Slack #puzzles channelFor a previous puzzle, I needed a set of fixed-length ...

Enter the Agentic Era with Splunk AI Assistant for SPL 1.4

  🚀 Your data just got a serious AI upgrade — are you ready? Say hello to the Agentic Era with the ...

Stronger Security with Federated Search for S3, GCP SQL & Australian Threat ...

Splunk Lantern is a Splunk customer success center that provides advice from Splunk experts on valuable data ...