Splunk Search
Highlighted

SEDCMD regular expression question

Explorer

Okay you regexperts, I need some help. I have a .csv file for which I need to mask the credit card numbers. Here is what it looks like (with all fake data and cc number)

user,first_name,last_name,email,cc_type,cc_no
bfiltness0,Bria,Filtness,bfiltness0@sayntec.com,jcb,3543149367325423

I've been trying to build my own regex expression, but with no luck. I would just like to replace the credit card number with xxxx. Any help would be greatly appreciated!

0 Karma
Highlighted

Re: SEDCMD regular expression question

Contributor

Try this sedcmd in your props under your sourcetype, or you could also specify it by host or source. This will take the 16 digit number and replace it with xxx.

SEDCMD-cc_replacement = s/\,(\d{16})/xxx/g

https://docs.splunk.com/Documentation/Splunk/latest/Data/Anonymizedata

0 Karma
Highlighted

Re: SEDCMD regular expression question

Explorer

Thank you! That seemed to partially work. It's masking it in some places.

alt text

0 Karma
Highlighted

Re: SEDCMD regular expression question

Esteemed Legend

Yours drops the last comma.

0 Karma
Highlighted

Re: SEDCMD regular expression question

Esteemed Legend

Like this:

[csv]
SEDCMD-YourSourcetypeHere_obscure_CCs = s/\d+$/x{4}/g

View solution in original post

0 Karma
Highlighted

Re: SEDCMD regular expression question

New Member

If your credit card is not define with 16 number. You can try replace:
SEDCMD-cc_replacement = s/\,(\d{16})/xxxx/g
to
SEDCMD-cc_replacement = s/\,(\d+)/\,xxxx/g

follow oscar84x said

0 Karma
Highlighted

Re: SEDCMD regular expression question

Explorer

Neither of these mask the data, though. I must be doing something wrong. This is my props.conf

[csv]
SEDCMD-mask = s/\d+$/x{4}/g
0 Karma
Highlighted

Re: SEDCMD regular expression question

Esteemed Legend

If you are sure that your settings are correct, it must be something else. If you are doing a sourcetype override/overwrite, you must use the ORIGINAL value, NOT the new value. You must deploy your settings to the first full instance(s) of Splunk that handle the events (usually either the HF tier if you use one, or else your Indexer tier) UNLESS you are using HEC's JSON endpoint (it gets pre-cooked) or INDEXEDEXTRACTIONS (configs go on the UF in that case), then restart all Splunk instances there. When (re)evaluating, you must send in new events (old events will stay broken), then test using `index_earliest=-5m` to be absolutely certain that you are only examining the newly indexed events.

0 Karma
Highlighted

Re: SEDCMD regular expression question

Explorer

It seems to be masking it when I look at the raw data, but I can still, for example, do | table cc_no and display all the CC numbers.

0 Karma
Highlighted

Re: SEDCMD regular expression question

Explorer

Hi @woodcock,

I have verified that the the data coming in is hitting a HF first, then forwarding to a search head. When the data gets to the search head, I can see that it's replacing the cc number in the raw event (when I "show source" it does not show the cc number). However, cc_no still shows up as a field with populated values. In the images below, I've replaced the cc number with the string "secret" using your recommended sed. The first image is the raw data.

alt text

alt text

0 Karma