There is some data that we want to sanitize in Splunk. I've already got a SEDCMD to do it for newly indexed data, but is there some way to modify the events that have already been indexed in Splunk. At worst, I will delete the events, but ideally I would like to just XXX out a specific field.
Hi,
Data in Splunk is indeed immutable. This doesn't mean that with a little work, that the data can't be cleaned up and made available for search without the PII data in there.
0) You already nailed part of the solution: SEDCMD to keep the problem from getting worse for new data indexed.
1) Create a search that finds all the events with the PII data in it that needs to be cleansed. Run that on the ./splunk CLI and dump the results to a file.
2) Use your favorite text mangling tools (sed, awk, perl, LISP 🙂 ) to sanitize the data on disk.
3) Run the original search again, this time with '... | delete' at the end, to mark the existing entries as unavailable for being included in search results.
4) Use ./splunk add oneshot to re-index the sanitized data file.
5) enjoy a frosty beverage of your choice for a job well done.
The original search & |delete may take quite a while, depending on how many events need to be found & extracted. The oneshot will slurp it back in as fast as the forwarder/index can absorb it. Note that oneshot WILL count against the license, so plan accordingly.
As far as I know, once it is indexed, it is immutable. You can restrict access to the data via a role's search strings, and you can use | rex mode=sed ...
to hide data at search time. Perhaps combine both to enforce a sed for a particular role?