All Apps and Add-ons

Sanitize already indexed data

mslvrstn
Communicator

There is some data that we want to sanitize in Splunk. I've already got a SEDCMD to do it for newly indexed data, but is there some way to modify the events that have already been indexed in Splunk. At worst, I will delete the events, but ideally I would like to just XXX out a specific field.

Tags (2)

davidpaper
Contributor

Hi,

Data in Splunk is indeed immutable. This doesn't mean that with a little work, that the data can't be cleaned up and made available for search without the PII data in there.

0) You already nailed part of the solution: SEDCMD to keep the problem from getting worse for new data indexed.
1) Create a search that finds all the events with the PII data in it that needs to be cleansed. Run that on the ./splunk CLI and dump the results to a file.
2) Use your favorite text mangling tools (sed, awk, perl, LISP 🙂 ) to sanitize the data on disk.
3) Run the original search again, this time with '... | delete' at the end, to mark the existing entries as unavailable for being included in search results.
4) Use ./splunk add oneshot to re-index the sanitized data file.

5) enjoy a frosty beverage of your choice for a job well done.

The original search & |delete may take quite a while, depending on how many events need to be found & extracted. The oneshot will slurp it back in as fast as the forwarder/index can absorb it. Note that oneshot WILL count against the license, so plan accordingly.

0 Karma

Jason
Motivator

As far as I know, once it is indexed, it is immutable. You can restrict access to the data via a role's search strings, and you can use | rex mode=sed ... to hide data at search time. Perhaps combine both to enforce a sed for a particular role?

0 Karma
Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

Build the Future of Agentic AI: Join the Splunk Agentic Ops Hackathon

AI is changing how teams investigate incidents, detect threats, automate workflows, and build intelligent ...

[Puzzles] Solve, Learn, Repeat: Character substitutions with Regular Expressions

This challenge was first posted on Slack #puzzles channelFor BORE at .conf23, we had a puzzle question which ...

Splunk Community Badges!

  Hey everyone! Ready to earn some serious bragging rights in the community? Along with our existing badges ...