Splunk Search

What are my options to indicate that Splunk has modified an event in a particular way (for auditing purposes)?

woodcock
Esteemed Legend

We all know about this stuff:
https://docs.splunk.com/Documentation/SplunkCloud/latest/Data/Anonymizedata

Let's say I am cleaning up PII but I need to leave behind something to indicate my WHY I told Splunk to do it.
Ideally, I'd like a metadata field (not inside the raw event itself) named RawModReason or something like that.
Then, if I obscured a SSN, ( SEDCMD-obscureSSN ), I would assign a value of obscuredSSN.
Or if I clipped out a Credit Card Number, I might assign a value of removedCCN.
The idea is to be able to answer the question "Has this event been modified?" (and later, "For what reason?" or "In what way?") but I'd like it to be hidden enough that an auditor will be unlikely to discover that it has been modified if I use |outputcsv or other export of data (for audit, lawsuit, whatever).

The other dicey part is creating a multi-valued meta field. Is that even possible?

I cannot think of any way to do this flexibly but I am sure that it must be possible. Surely I am not the first person to find myself in this position!

DalJeanis
Legend

Okay, a two phase approach should work. The first transform detects that we are going to mask the data, so it creates the meta field and sets the flag. The second phase masks the data.

[somename1]
REGEX = .+ssn=\d{5}\d{4}.*
SOURCE_KEY = _raw
FORMAT = _mymvcodes::obscuredSSN
WRITE_META = true

[somename2]
REGEX = (.+ssn=)\d{5}(\d{4}.*)
SOURCE_KEY = _raw
FORMAT = $1xxxxx$2
DEST_KEY = _raw

The below strategy will not work. sedcmd all happen in order at one time, so the flags would stay in the data.

Very interesting question. The rule runs on every event, but which ones actually obscured anything, and how many items were obscured, would not be immediately apparent.

Here's a strategy. There may be something more direct, but let's pretend for a moment that Splunk only has access to what's in Dal's head.

Our Anonymization takes place in three phases.

Phase 1 - anonymize each field, and additionally place a marked code in its place.
Phase 2 - extract ALL the marked codes to an mv metadata field
Phase 3 - delete all the marked codes.

So, for phase 1, instead of this

[source::.../accounts.log]
SEDCMD-accounts = s/ssn=\d{5}(\d{4})/ssn=xxxxx\1/g s/cc=(\d{4}-){3}(\d{4})/cc=xxxx-xxxx-xxxx-\2/g

...you might do this (just add extra codes marked by !!##= something ##!!...)

[source::.../accounts.log]
SEDCMD-accounts = s/ssn=\d{5}(\d{4})/ssn=!!##=obscuredSSN.##!!xxxxx\1/g s/cc=(\d{4}-){3}(\d{4})/cc=!!##=obscuredCCN##!!xxxx-xxxx-xxxx-\2/g

For phase 2, you might have this...

[some name]
REGEX = !!##=(\w+)##!!
SOURCE_KEY = _raw
FORMAT = Masked Type $1
DEST_KEY = _mymvcodes

For Phase 3, you have this

SEDCMD-killem = s/!!##=\w+##!!//g

That should work, assuming the extract can be made to occur between the two SEDCMDs.

0 Karma
Get Updates on the Splunk Community!

Webinar Recap | Revolutionizing IT Operations: The Transformative Power of AI and ML ...

The Transformative Power of AI and ML in Enhancing Observability   In the realm of IT operations, the ...

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

ICYMI - Check out the latest releases of Splunk Edge Processor

Splunk is pleased to announce the latest enhancements to Splunk Edge Processor.  HEC Receiver authorization ...