Getting Data In

How to index less data?

Builder

I would like to index less data into Splunk by modifying several XML sources so that I'm only including certain fields and formatting it as a key-value pairs. I believe I can do this by creating a scripted input. I've looked at documentation here but I'm still unsure if this what I need and how to implement.

Also - when using a scripted input how do you prevent duplicate data from being indexed? Does Splunk have an internal mechanism for this or do I need to include this logic in my script?

Can somebody help point me in the right direction?

0 Karma
1 Solution

SplunkTrust
SplunkTrust

Hi sc0tt

scripted inputs are one approach, another would be to use props and transforms to send unwanted data to the null queue. Have a look at the docs about filter and route for more details about this topic.

hope this helps ...

cheers, MuS

View solution in original post

SplunkTrust
SplunkTrust

Hi sc0tt

scripted inputs are one approach, another would be to use props and transforms to send unwanted data to the null queue. Have a look at the docs about filter and route for more details about this topic.

hope this helps ...

cheers, MuS

View solution in original post

Builder

In the end I used the filter and route method that you referenced and used a sed script. This works perfectly. Thanks again.

0 Karma

Builder

After searching around Splunk Answers more I came across several posts regarding indexing XML files and field extraction. I believe this is what I need. I'm going to try to give those suggestions a shot and see if that works.

0 Karma

Builder

Please correct me if I'm wrong, but field extraction will just create fields at index time based on the the raw data, but it will not change the amount of data that is being indexed, correct? My goal is to transform the raw XML data into a new slimmed down Splunk friendly format. For this, I believe that creating a scripted input may be the best solution.

0 Karma

SplunkTrust
SplunkTrust

This should be possible, but this is field extraction and is handled here in the docs http://docs.splunk.com/Documentation/Splunk/6.0/Data/Aboutindexedfieldextraction

Builder

Thanks. Would I be able to change the data format to create new fields with the filter and routes option? For example, could I use a regular expression to filter an XML file for something like <Field>Value</Field> and create field = value? This way I get rid of a lot of extra data that I don't need and only keep a simple key-value pair?

0 Karma