Getting Data In

Unable to get SEDCMD to mask SSNs (on Indexer)

srseceng
Explorer

Hello, I am testing using SEDCMD on a single Splunk server architecture.

Below is the current configuration which is put into /opt/splunk/etc/system/local/ - I am uploading a CSV file which contains (fake) individual data including two formats of SSN (xxx-xx-xxxx & xxxxxxxxx). The masking is not working when I upload the CSV file. Can someone help point me in the right direction?

props.conf

### CUSTOM ###
[csv]
SEDCMD-redact_ssn = s/\b\d{3}-\d{2}-\d{4}\b/XXXXXXXXX/g

 

Included below is FAKE individual data pulled from the CSV file for testing:


514302782,f,1986/05/27,Nicholson,Russell,Jacki,3097 Better Street,Kansas City,MO,66215,913-227-6106,jrussell@domain.com,a,345389698201044,232,2010/01/01
505-88-5714,f,1963/09/23,Mcclain,Venson,Lillian,539 Kyle Street,Wood River,NE,68883,308-583-8759,lvenson@domain.com,d,30204861594838,471,2011/12/01

Labels (2)
0 Karma

PickleRick
SplunkTrust
SplunkTrust

The default csv sourcetype has

INDEXED_EXTRACTIONS=csv

It changes how the data is processed. Even if the SEDCMD is applied (of which I'm not sure), the fields are already extracted and since you're only editing _raw, you're not changing already extracted fields.

0 Karma

srseceng
Explorer

ah! ok, so I need to test this a different way and update the SEDCMD command to reference the new source type.

What's the next easiest method to test? Setup a UF with a file monitor?

0 Karma

PickleRick
SplunkTrust
SplunkTrust

No, you can just define another sourcetype and upload the file onto your all-in-one instance. The trick will be to handle the csv fields properly. If I remember correctly, with INDEXED_EXTRACTIONS=csv Splunk uses first (by default) line of input file to determine field names. Without it you need to explicitly name field names and use proper FIELD_DELIMITER so that Splunk knows what the fields are (or write a very ugly regex-based extraction pattern).

isoutamo
SplunkTrust
SplunkTrust

On Slack is a new MASA diagram from where you could see how those pipelines are working and which conf files (and parameters) are affecting to those events. https://splunk-usergroups.slack.com/archives/CD9CL5WJ3/p1710515462848799?thread_ts=1710514363.198159...

0 Karma

srseceng
Explorer

I just found this, from this admin guide:

To anonymize data with Splunk Enterprise, you must configure a Splunk Enterprise instance as a heavy forwarder and anonymize the incoming data with that instance before sending it to Splunk Enterprise.

Previously in other documents it had said this can be performed on either the Indexer OR a Heavy Forwarder. I wonder if this is why it isn't working?


https://docs.splunk.com/Documentation/Splunk/9.2.0/Data/Anonymizedata

0 Karma

srseceng
Explorer

srseceng_0-1710872698173.png

 

0 Karma

richgalloway
SplunkTrust
SplunkTrust

It's possible Splunk doesn't like the \b metacharacter.  Try this alternative.

SEDCMD-redact_ssn = s/(\D)\d{3}-?\d{2}-?\d{4}(\D)/\1XXXXXXXXX\2/g

I also modified the regex to preserve the characters before and after the SSN and to make the hyphens optional.

 

---
If this reply helps you, Karma would be appreciated.

srseceng
Explorer

Thanks for the info!

I deleted the events, updated props.conf, restarted splunk, then uploaded the CSV again - but it is not working yet. 

srseceng_0-1710864113460.png

 

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @srseceng ,

how do you take these logs: from a Universal Forwarder or from an Hevy Forwarder?

If from an hevy forwarder, the SEDCMD props.conf must be located on the HF.

If you receive these logs from a Universal Forwarder and there ins't any intermediate Heavy Forwarder the props.conf can be located on the Indexers.

In other words, parsing and typing is done in the first full Splunk instance that the data are passing through.

Then, check if the regex and the sourcetype are correct.

Ciao.

Giuseppe

srseceng
Explorer

Because this is a test environment, the logs are being added through the UI's "Add Data" > "Upload" feature. I have a CSV file that contains the logs. 

Is this a valid test method?

0 Karma

isoutamo
SplunkTrust
SplunkTrust
This Is valid method to do it.
Have you select correct sourcetype csv when you are uploading it?
0 Karma

srseceng
Explorer

Yes, it auto selects "CSV" during import but I have also manually selected CSV to see if there was a bug their. 

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @srseceng ,

OK, Add data of the same Indexer I suppose.

In this case the issue is to search in the regex: what does it happen running the sed regex in the Splunk ui?

Are you sure about the sourcetype?

Did you restarted Splunk after props.conf update?

Sorry for the stupid questions, but "Once you eliminate the impossible, whatever remains, no matter how improbable, must be the truth" (Sir Artur Conan Doyle)!

Ciao.

Giuseppe

srseceng
Explorer

If I run this:

index=main | rex field=_raw mode=sed "s/(\D)\d{3}-?\d{2}-?\d{4}(\D)/\1XXXXXXXXX\2/g"

I get all of the results back, but the SSN's are still in clear text (not redacted)

0 Karma
Get Updates on the Splunk Community!

Video | Welcome Back to Smartness, Pedro

Remember Splunk Community member, Pedro Borges? If you tuned into Episode 2 of our Smartness interview series, ...

Detector Best Practices: Static Thresholds

Introduction In observability monitoring, static thresholds are used to monitor fixed, known values within ...

Expert Tips from Splunk Education, Observability in Action, Plus More New Articles on ...

Splunk Lantern is a Splunk customer success center that provides advice from Splunk experts on valuable data ...