Getting Data In

Ignoring massive amounts of data at index time

msarro
Builder

Hey everyone. We are working on taking in large amounts of CSV data. Each line of the CSV is a single event, and each line is comprised of about 270 fields. Currently only about 40 of those fields are useful. Right now our props.conf and transforms.conf are set to index each field with a specific field name.

How would we best proceed to strip out the fields that we don't need so they don't get indexed? It would be a substantial cost savings for us. I'd prefer not to make the worlds nastiest REGEX but if I have to I will.

Tags (1)
0 Karma

dwaddle
SplunkTrust
SplunkTrust

If you are looking to strip "columns" out of the CSV data at index time, about the only way you'd be able to do it is with a SEDCMD. I gamble this would be a nontrivial regex to write.

Maybe you could use a scripted input to read the CSV file, and feed it through (say) Python's csv module to only emit those fields of interest? I think the biggest issue here would be keeping up with how much of the CSV file you have previously read/transformed/sent to splunk. This could be easy, or could require you to re-implement much of the tailing processor's functionality around file rotations and such.

0 Karma

msarro
Builder

Hm, so it does look like it will be a massive regex. What sort of overhead would a regex of that size incur? Sadly one issue we'd run into with this is the fact that our source moves multiple times.

0 Karma
Get Updates on the Splunk Community!

Data Management Digest – November 2025

  Welcome to the inaugural edition of Data Management Digest! As your trusted partner in data innovation, the ...

Splunk Mobile: Your Brand-New Home Screen

Meet Your New Mobile Hub  Hello Splunk Community!  Staying connected to your data—no matter where you are—is ...

Introducing Value Insights (Beta): Understand the Business Impact your organization ...

Real progress on your strategic priorities starts with knowing the business outcomes your teams are delivering ...