Getting Data In

How to mask a field value from raw events that shows in multiple patterns

johnward4
Communicator

I'm trying to mask a field value for a policy number that is present in my raw logs under different patterns. To explain I'm using a field extraction:

EXTRACT-policyNumber = policy.*(-|=)\s(?P\w+)

This extracts the policyNumber value for any word that follows a string in my logs that has the word policy and anything characters after but has either an = sign or - sign followed by a space before the policyNumber value.

I'm trying to add a line in my props.conf to mask any of these values with X, help appreciated. Here's what I've tried so far :

SEDCMD-policyNumber_mask = s/policy.*(-|=)\s(\w+)/policy.*(-|=)\s\"XXXXXXX/g
0 Karma
1 Solution

woodcock
Esteemed Legend

Keep in mind that this overwrites the raw data so you forever lose the policy number and your Extract will be XXXXXX:

SEDCMD-policyNumber_mask = s/(policy|bankAccountNumber"?[^-=:]+[-=:]\s*"?)\w+/\1XXXXXXX/g

View solution in original post

woodcock
Esteemed Legend

Keep in mind that this overwrites the raw data so you forever lose the policy number and your Extract will be XXXXXX:

SEDCMD-policyNumber_mask = s/(policy|bankAccountNumber"?[^-=:]+[-=:]\s*"?)\w+/\1XXXXXXX/g

johnward4
Communicator

@woodcock

I have another format in my log that I'm trying to mask but I've tried several combinations and not having luck with this format.

"bankAcctType":"Saving","bankRoutingNumber":"55522244","bankAccountNumber":"11133344444","accountHolderName":"John","AccountLastName"Doe"","signature":null,"additionalComments":""}}

I've tried applying these props.conf to the data to mask, help appreciated :

[test:log]
CHARSET = UTF-8
LINE_BREAKER = ([\r\n]+)\d{2,4}-\d{2}-\d{2}\s\d{2}:\d{2}:\d{2}\,\d{1,3}\s\w+\s+[
MAX_TIMESTAMP_LOOKAHEAD = 25
TIME_FORMAT = %Y-%m-%d %H:%M:%S,%3N
TIME_PREFIX = ^
category = Splunk App Add-on Builder
pulldown_type = 1
KV_MODE = none
NO_BINARY_CHECK = true
disabled = false
BREAK_ONLY_BEFORE_DATE =
DATETIME_CONFIG =
SHOULD_LINEMERGE = false
SEDCMD-anon = s/(bankAcctType\":\")(\w+/)/XXXXXXXXX\2/g s/(bankRoutingNumber\":\")(\d+)/XXXXXXXXX\2/g s/(bankAccountNumber\":\")(\d+)/XXXXXXXXXXX\2/g

0 Karma

woodcock
Esteemed Legend

Of course it doesn't work; there is no policy string! Try my updated answer.

0 Karma

johnward4
Communicator

So there are several different formats where the policy number shows throughout this log. I have a transforms.conf to filter for this particular format and send it to a different sourcetype :

transforms.conf

[test_sourcetype_CSA]
REGEX = \sservice\.CSAServiceImpl\s\(CSAServiceImpl\.
FORMAT = sourcetype::test:CSA
DEST_KEY = MetaData:Sourcetype

Sample log :
2019-12-03 15:17:32,57 DEBUG [ajp-/0.0.0.0:8209-16] service.CSAServiceImpl (CSAServiceImpl.java:89) []
- CSA Request object in debug: { "policyNumber":"2L77755540","bankAcctType":"Checking","bankRoutingNumber":"222111333","bankAccountNumber":"22222444888"}}

Then I was trying to apply this under the

[test:CSA]
SEDCMD-anon = s/\"policyNumber\":\"(\w+)/policyNumber"XXXXXXXXXXXXXXX/g s/bankRoutingNumber\":\"(\d+)/bankRoutingNumber\":\"XXXXXXXXX/g s/bankAccountNumber\":\"(\d+)/bankAccountNumber\":\"XXXXXXXXXXX/g

0 Karma

woodcock
Esteemed Legend

Again, try my updated answer; it should accommodate all variations.

0 Karma

johnward4
Communicator

Thank you, that example didn't seem to work for me but this did :

SEDCMD-anon = s/(bankRoutingNumber|bankAccountNumber)\"(:)+\"(\w+)/\1":"XXXXXXXXXXX/g

0 Karma

johnward4
Communicator

Thanks @woodcock ! It's masking every policyNumber exception logs with this format :

2019-11-25 07:51:39,659 INFO  [ajp-/0.0.0.0:8209-17] security.SAMLAuthSuccessHandler (SAMLAuthSuccessHandler.java:104)  []
                                        - policyNumbers (filtered) for the user are ----------------------- 3T00005555
0 Karma

woodcock
Esteemed Legend

So it works, right?

0 Karma

johnward4
Communicator

It is for most events except the log from my last update, any thoughts on why it didnt apply to this one

0 Karma

johnward4
Communicator

@woodcock I added a 2nd capture group and now it's masking all my policyNumbers. Thank you!

Here's what I used in case this helps someone else :

SEDCMD-policyNumber_mask = s/(policy[^-=]+[-=]\s+)\w+/\1XXXXXXXXXXXXXXX/g s/(policyNumbers[^-]+\s+)\w+/\2XXXXXXXXXXXXXXX/g

Get Updates on the Splunk Community!

Splunk Enterprise Security 8.0.2 Availability: On cloud and On-premise!

A few months ago, we released Splunk Enterprise Security 8.0 for our cloud customers. Today, we are excited to ...

Logs to Metrics

Logs and Metrics Logs are generally unstructured text or structured events emitted by applications and written ...

Developer Spotlight with Paul Stout

Welcome to our very first developer spotlight release series where we'll feature some awesome Splunk ...