Solved: Add a custom field at index time based on sourcety...

DEAD_BEEF · ‎03-13-2018

Hi everyone. I've been going back and forth through the docs and other answers posted here, but nothing definitive in answering my question.

I want to create a new field at index time which would be called retention and specify retention time based on either index name or sourcetype. That way, when users are browsing web logs, they know that they are retained for 5 years ( retention = 5 )while DNS logs are only maintained for 1 year ( retention = 1 ).

Everything I have read regarding creating custom fields at index time go back to using regex and extracting an existing field in the _raw log, but this information is no where in the logs. If it is not possible at index time, what alternative method would you suggest? Tags? Search time extraction is also an option if index time is not possible but since this isn't extracting a value from _raw I'm having trouble figuring this problem out.

tiagofbmm · ‎03-13-2018

The search time option is available with something in props.conf like

EVAL-retention=if(index="temporary",1,5)

I wonder why you would want to do that in index time, if you are not planning to have heavy calculation about it. Indexed fields allow for tstats and improve a lot performance, but if it just for the users to see the raw field, I'd go for the Search Time option. What do you think?

View solution in original post

knielsen · ‎03-13-2018

You could use meta information in the forwarding config, but I think it's easier to just use a calculated field (if you haven't too many different sourcetypes). So just add a new calculated field per sourcetype named retention and enter the required retention time as integer in the "Eval expression" field.

Hth,
Kai.

DEAD_BEEF · ‎03-13-2018

I tried writing a props/transforms using the metadata but couldn't figure it out as I thought I would be overwriting the sourcetype name (I'm not super familiar with this). I have to send some of the indexed data to other splunk instances so I thought doing this at index time will be helpful for them as well to have this field as well.

I like the search time option with the calculated field (didn't realize could use an EVAL like this to create a field, now I feel silly). If I were to try to do this at index time, can you tell me if I'm on the right track? I am not sure where to specify the sourcetype name. Thank you for your input.

transforms.conf

[retention]
SOURCE_KEY = _MetaData:Sourcetype
FORMAT = retention::"5"
WRITE_META = true

props.conf

[web]
TRANSFORMS-addretention5y = retention

fields.conf

[retention]
INDEXED=true

tiagofbmm · ‎03-13-2018

The search time option is available with something in props.conf like

EVAL-retention=if(index="temporary",1,5)

I wonder why you would want to do that in index time, if you are not planning to have heavy calculation about it. Indexed fields allow for tstats and improve a lot performance, but if it just for the users to see the raw field, I'd go for the Search Time option. What do you think?

DEAD_BEEF · ‎03-13-2018

I was thinking index time because some logs are sent to other instances of splunk (once indexed, cooked data) that are not within my control and thought that information would be valuable to those teams. However,it's not mandatory, just a "would be nice if possible" kinda thing. I like this idea as well, I didn't realize that you could use EVAL that way, I should have read over props more carefully. I was too focused in building out a skeleton props/transforms for the index time.

So is it not possible to do it at index time? I want to present all possible options to the decision makers but will go over the pros/cons of each. Obviously the hit to performance is my biggest unknown.

tiagofbmm · ‎03-13-2018

Indexed fields will always need more space because of the need to create btrees to make them searchable. That one is a fact.

About performance, indexed fields are really good but you have to choose wisely, specially fields that don't have high cardinality (number of different values).

If you don't need to control the subsequent places the data is travelling through. If a source of events is stamped with a sourcetype, then when you search for them in a Search Head, Splunk will send those EVAL I mentioned as a bundle so the indexer knows what to extract. So search time will solve it.

Finally, yes, it is possible to do that in index time indeed.

If you look here http://docs.splunk.com/Documentation/Splunk/7.0.2/Data/Configureindex-timefieldextraction

You'll see that you can yes use a REGEX on the _raw data, but you can also put a specific value in it

REGEX = <regular_expression>
FORMAT = retention::5

This would put a retention=5 to a specific sourcetype.

I honestly think it is an overkill for your use case, but of course it is your call

DEAD_BEEF · ‎03-13-2018

Ahh ok, I kept seeing REGEX and was thinking, "i'm not matching and extracting anything, I just want to add a value instead". I didn't realize you can just specify a specific value.

I should have been clearer, after the data is cooked, it gets sent off to other splunk instances that are outside my SH cluster, so the search-time extractions that I make won't be running against the data for the other teams that receive the data. Hence why I wanted index time. But after reading your comment more, I am leaning towards seach-time and let those other teams do as they will. I will present both options as I don't make these kinds of decisions 😄

Thank you for the insight.

tiagofbmm · ‎03-13-2018

It was a good discussion, but only now did I fully understand your index-time need!

Add a custom field at index time based on sourcetype or index name

Introducing the Splunk Community Dashboard Challenge!

Built-in Service Level Objectives Management to Bridge the Gap Between Service & ...

Get Your Exclusive Splunk Certified Cybersecurity Defense Engineer Certification at ...