I have a requirement to mask the value of a field after 30 days.
The events are json events. The users need to be able to see/search all the fields except 1 for up to a year. The 1 field must be hidden from view after 30 days.
My plan was to define a calculated field that, when _time is more than 30 days ago overwrites the value of the field with one I supply. The calculation would be performed for every search. What I failed to consider was 2 things:
First, The field to be overwritten is a json field. The fieldname is foo{}.id If I use
|eval foo{}.id = if ((_time < (now() - (86400*30))), "TOO OLD", foo{}.id)
, I get an error that the eval is malformed. If I add quotes around the field names like this: |eval "foo{}.id" = if ((_time < (now() - (86400*30))), "TOO OLD", "foo{}.id")
, I get a new field called foo.id which = TOO OLD, but I still have the original foo{}.id with the original value.
Second, Even if I can get the calculated field to behave properly, the original value is still in the _raw field which is easily visible in the events view or by adding _raw to a table.
So, is it possible to overwrite a single field at search time such that every search will return the overwritten value?
Also, can I somehow remove the _raw field for every search, and if so, are there any weird consequences from doing that?
I would do this: At the time of index, modify the event to create a hash using the time-sensitive field and replace the field value in the raw event with the hash. At the same time, add the value with the hash and a date in a KV store
so that the data exists in 2 separate places. Then every day purge the KV store
of any data that is older than 30 days. When you search, use a lookup
on the hash in the event to pull in the field value from the KV store
and after 30-days, the lookup
will fail.
This sounds like a great approach. So, I'd need a script to pre-process the data files before they are given to the splunk Universal Forwarder, right?
You've got it.
You will need to re-index the event after modifying it and the delete the original event. You can use collect
to do this.
I saw a reference to this solution in another answer, but didn't understand it. I thought summary indexes were mainly used to collect the output of stats commands so you can keep counts longer than the actual data. How does a summary index work when you just want to re-index an entire event that is already indexed? Does it just send the _raw field value through the index/parsing pipeline again? if so, do I just need to use |rex to mask the field in the raw json?
Are the same props and transforms applied to the summary indexed data that is applied to the original data? I want to make sure that I can just add the summary index to all of my searches and have them still work.
Any details you can give me would be greatly appreciated. I'd really like to more fully understand how this works.
Thanks...
Although collect
is intended to write to a Summary Index
, in actuality, it can write to any Index. Play around with it and you will see what it does.
|noop|stats count AS TestOfCollect | collect index=myIndex
Then check it out:
index=myIndex | where isnotnull(TestOfCollect)
Then throw it away and refine:
index=myIndex | where isnotnull(TestOfCollect) | delete
Be aware that using collect
to a non-Summary Index will incur double-license hit.