I've been asked to index a new sourcetype which is a set of XML-files. The files contains a tag
<attachments>...</attachments>
which I want to skip, since it is of no value at all indexing the attachment as raw data...it just makes it harder to see the forest for all the trees.
Could this be done?
Update
Realized that the most obvious answer is "Preprocess the files, remove the tag then index the file", but still hoping that Splunk can be told to do this for me.
In props.conf you can use the command:
SEDCMD
This doc talks about anonymizing data using a SED script... and what it does is match a pattern and replace it in the example.
You'll do the same, but replace it with nothing... You can try the effect using the Data onboarding wizard (Add Data)
But it would be something like this:
props.conf
SEDCMD - dumpAttach = /s\[^\<]+\<\/attachments\>//g
In props.conf you can use the command:
SEDCMD
This doc talks about anonymizing data using a SED script... and what it does is match a pattern and replace it in the example.
You'll do the same, but replace it with nothing... You can try the effect using the Data onboarding wizard (Add Data)
But it would be something like this:
props.conf
SEDCMD - dumpAttach = /s\[^\<]+\<\/attachments\>//g
That did the trick!!
Great! 🙂