Getting Data In

Is it possible to remove data from an event before indexing?

rune_hellem
Contributor

I've been asked to index a new sourcetype which is a set of XML-files. The files contains a tag

<attachments>...</attachments>

which I want to skip, since it is of no value at all indexing the attachment as raw data...it just makes it harder to see the forest for all the trees.

Could this be done?

Update
Realized that the most obvious answer is "Preprocess the files, remove the tag then index the file", but still hoping that Splunk can be told to do this for me.

0 Karma
1 Solution

rsennett_splunk
Splunk Employee
Splunk Employee

In props.conf you can use the command:
SEDCMD

http://docs.splunk.com/Documentation/Splunk/6.2.4/Data/Anonymizedatausingconfigurationfiles#Anonymiz...

This doc talks about anonymizing data using a SED script... and what it does is match a pattern and replace it in the example.
You'll do the same, but replace it with nothing... You can try the effect using the Data onboarding wizard (Add Data)

But it would be something like this:

props.conf
SEDCMD - dumpAttach = /s\[^\<]+\<\/attachments\>//g

With Splunk... the answer is always "YES!". It just might require more regex than you're prepared for!

View solution in original post

rsennett_splunk
Splunk Employee
Splunk Employee

In props.conf you can use the command:
SEDCMD

http://docs.splunk.com/Documentation/Splunk/6.2.4/Data/Anonymizedatausingconfigurationfiles#Anonymiz...

This doc talks about anonymizing data using a SED script... and what it does is match a pattern and replace it in the example.
You'll do the same, but replace it with nothing... You can try the effect using the Data onboarding wizard (Add Data)

But it would be something like this:

props.conf
SEDCMD - dumpAttach = /s\[^\<]+\<\/attachments\>//g

With Splunk... the answer is always "YES!". It just might require more regex than you're prepared for!

rune_hellem
Contributor

That did the trick!!

0 Karma

rsennett_splunk
Splunk Employee
Splunk Employee

Great! 🙂

With Splunk... the answer is always "YES!". It just might require more regex than you're prepared for!
0 Karma
Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

Best Practices: Splunk auto adjust pipeline queue

When you enable autoAdjustQueue in Splunk, maxSize should be understood as the queue size Splunk starts with ...

Request for Professional Development: Attending .conf26

Winning Over the Boss: Your Pass to .conf26 conf26 is going to be here before you know it. If don't already ...

Casting Call: Compete in Cyber Games

Lights, Camera, SecOps: Apply to Compete in Cyber Games     Think you have what it takes to beat the clock? ...