All Apps and Add-ons

Sanitize already indexed data

mslvrstn
Communicator

There is some data that we want to sanitize in Splunk. I've already got a SEDCMD to do it for newly indexed data, but is there some way to modify the events that have already been indexed in Splunk. At worst, I will delete the events, but ideally I would like to just XXX out a specific field.

Tags (2)

davidpaper
Contributor

Hi,

Data in Splunk is indeed immutable. This doesn't mean that with a little work, that the data can't be cleaned up and made available for search without the PII data in there.

0) You already nailed part of the solution: SEDCMD to keep the problem from getting worse for new data indexed.
1) Create a search that finds all the events with the PII data in it that needs to be cleansed. Run that on the ./splunk CLI and dump the results to a file.
2) Use your favorite text mangling tools (sed, awk, perl, LISP 🙂 ) to sanitize the data on disk.
3) Run the original search again, this time with '... | delete' at the end, to mark the existing entries as unavailable for being included in search results.
4) Use ./splunk add oneshot to re-index the sanitized data file.

5) enjoy a frosty beverage of your choice for a job well done.

The original search & |delete may take quite a while, depending on how many events need to be found & extracted. The oneshot will slurp it back in as fast as the forwarder/index can absorb it. Note that oneshot WILL count against the license, so plan accordingly.

0 Karma

Jason
Motivator

As far as I know, once it is indexed, it is immutable. You can restrict access to the data via a role's search strings, and you can use | rex mode=sed ... to hide data at search time. Perhaps combine both to enforce a sed for a particular role?

0 Karma
Get Updates on the Splunk Community!

Extending Observability Content to Splunk Cloud

Watch Now!   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to leverage ...

More Control Over Your Monitoring Costs with Archived Metrics!

What if there was a way you could keep all the metrics data you need while saving on storage costs?This is now ...

New in Observability Cloud - Explicit Bucket Histograms

Splunk introduces native support for histograms as a metric data type within Observability Cloud with Explicit ...