What are my options to indicate that Splunk has mo...

woodcock · ‎09-04-2017

We all know about this stuff:
https://docs.splunk.com/Documentation/SplunkCloud/latest/Data/Anonymizedata

Let's say I am cleaning up PII but I need to leave behind something to indicate my WHY I told Splunk to do it.
Ideally, I'd like a metadata field (not inside the raw event itself) named RawModReason or something like that.
Then, if I obscured a SSN, ( SEDCMD-obscureSSN ), I would assign a value of obscuredSSN.
Or if I clipped out a Credit Card Number, I might assign a value of removedCCN.
The idea is to be able to answer the question "Has this event been modified?" (and later, "For what reason?" or "In what way?") but I'd like it to be hidden enough that an auditor will be unlikely to discover that it has been modified if I use |outputcsv or other export of data (for audit, lawsuit, whatever).

The other dicey part is creating a multi-valued meta field. Is that even possible?

I cannot think of any way to do this flexibly but I am sure that it must be possible. Surely I am not the first person to find myself in this position!

DalJeanis · ‎09-04-2017

Okay, a two phase approach should work. The first transform detects that we are going to mask the data, so it creates the meta field and sets the flag. The second phase masks the data.

[somename1]
REGEX = .+ssn=\d{5}\d{4}.*
SOURCE_KEY = _raw
FORMAT = _mymvcodes::obscuredSSN
WRITE_META = true

[somename2]
REGEX = (.+ssn=)\d{5}(\d{4}.*)
SOURCE_KEY = _raw
FORMAT = $1xxxxx$2
DEST_KEY = _raw

The below strategy will not work. sedcmd all happen in order at one time, so the flags would stay in the data.

Very interesting question. The rule runs on every event, but which ones actually obscured anything, and how many items were obscured, would not be immediately apparent.

Here's a strategy. There may be something more direct, but let's pretend for a moment that Splunk only has access to what's in Dal's head.

Our Anonymization takes place in three phases.

Phase 1 - anonymize each field, and additionally place a marked code in its place.
Phase 2 - extract ALL the marked codes to an mv metadata field
Phase 3 - delete all the marked codes.

So, for phase 1, instead of this

[source::.../accounts.log]
SEDCMD-accounts = s/ssn=\d{5}(\d{4})/ssn=xxxxx\1/g s/cc=(\d{4}-){3}(\d{4})/cc=xxxx-xxxx-xxxx-\2/g

...you might do this (just add extra codes marked by !!##= something ##!!...)

[source::.../accounts.log]
SEDCMD-accounts = s/ssn=\d{5}(\d{4})/ssn=!!##=obscuredSSN.##!!xxxxx\1/g s/cc=(\d{4}-){3}(\d{4})/cc=!!##=obscuredCCN##!!xxxx-xxxx-xxxx-\2/g

For phase 2, you might have this...

[some name]
REGEX = !!##=(\w+)##!!
SOURCE_KEY = _raw
FORMAT = Masked Type $1
DEST_KEY = _mymvcodes

For Phase 3, you have this

SEDCMD-killem = s/!!##=\w+##!!//g

That should work, assuming the extract can be made to occur between the two SEDCMDs.

What are my options to indicate that Splunk has modified an event in a particular way (for auditing purposes)?

Enterprise Security Content Update (ESCU) | New Releases

Why am I not seeing the finding in Splunk Enterprise Security Analyst Queue?

Index This | What are the 12 Days of Splunk-mas?