Splunk Enterprise

how to anonymize specific variables data from JSON File

mah
Contributor

Hi,

I want to mask just specific values. 

This is an example of a json event return in splunk :

{"MemorySize": 256, "region": "ca-central-1", "TracingConfig": \{"Mode": "PassThrough"\}, "RevisionId": "777", "Handler": "handleRequest", "Timeout": 600, "LastModified": "2020-05-27T14:05:43.839+0000", "Environment": \{"Variables": \{"ENVIRONMENT": "dev", "USER": "username",  "USERPASSD": "password", \}\}, "Role": "arn:aws:iam::666:role/X", "VpcConfig": \{"SubnetIds": ["subnet-000", "subnet-111"], "VpcId": "vpc-333", "SecurityGroupIds": ["sg-444"]\}, "CodeSize": 5555, "Description": "Lambda", "Runtime": "java11", "Version": "$LATEST"\}}

 

The problem is that sensitive data appear in clear specifically  in Environment>Variables

In this section, we have variables : the data are not the same in each event, we can not create a regex with specific key name because it always changes. 

How can I mask all values in the Environment>Variables WITHOUT masking the key ?

Example of result I want :

{"MemorySize": 256, "region": "ca-central-1", "TracingConfig": \{"Mode": "PassThrough"\}, "RevisionId": "777", "Handler": "handleRequest", "Timeout": 600, "LastModified": "2020-05-27T14:05:43.839+0000", "Environment": \{"Variables": \{"ENVIRONMENT": XXXXXX, "USER": XXXXXX,  "USERPASSD": XXXXXX, \}\}, "Role": "arn:aws:iam::666:role/X", "VpcConfig": \{"SubnetIds": ["subnet-000", "subnet-111"], "VpcId": "vpc-333", "SecurityGroupIds": ["sg-444"]\}, "CodeSize": 5555, "Description": "Lambda", "Runtime": "java11", "Version": "$LATEST"\}}

 

I tried a props.conf like that : 

[sourcetype]

INDEXED_EXTRACTION = json

KV_MODE = none

EXTRACT-var = \{\"Variables\"\:\s*\\\{(?<Variables>[^\}]+)\\

TRANSFORMS-anony = anony_raw

 

and a transforms.conf : 

[anony_raw]

REGEX = s/(\s*\"\s*[^\"]*\"[^\"]*\"([^\"]*)\s*\"\s*\,*)+

FORMAT = $1XXXXXX

DEST_KEY =_meta

SOURCE_KEY =_meta

 

But it doesn't work at all...

Can you help me ?

Labels (1)
0 Karma
1 Solution

rnowitzki
Builder

@mah you're welcome. I learned something myself 🙂

You might want to set my last reply as the solution, for future splunkers having similiar issues.

Cheers
Ralph

--
Karma and/or Solution tagging appreciated.

View solution in original post

0 Karma

to4kawa
SplunkTrust
SplunkTrust
0 Karma

mah
Contributor

Yes, I've seen this answer and tried the solution but doesn't work at all.

props.conf :

[description]

INDEXED_EXTRACTION = json

KV_MODE = none

TRANSFORMS-anony = anony, anony_raw

TRUNCATE = 0

SHOULD_LINEMERGE = false

 

transforms.conf

[anony]

INGEST_EVAL = Variables=md5(Variables)

WRITE_META = true

 

[anony_raw]

REGEX = (?m)(\s*\"\s*[^\"]*\"[^\"]*\"([^\"]*)\s*\"\s*\,*)+

FORMAT = $1XXXXXXX"

DEST_KEY = _raw

 

But thats still not working...

Did I make a mistake somewhere ?

0 Karma

to4kawa
SplunkTrust
SplunkTrust

 

| makeresults
| eval _raw="{\"MemorySize\":256,\"region\":\"ca-central-1\",\"TracingConfig\":{\"Mode\":\"PassThrough\"},\"RevisionId\":\"777\",\"Handler\":\"handleRequest\",\"Timeout\":600,\"LastModified\":\"2020-05-27T14:05:43.839+0000\",\"Environment\":{\"Variables\":{\"ENVIRONMENT\":\"XXXXXX\",\"USER\":\"XXXXXX\",\"USERPASSD\":\"XXXXXX\"}},\"Role\":\"arn:aws:iam::666:role/X\",\"VpcConfig\":{\"SubnetIds\":[\"subnet-000\",\"subnet-111\"],\"VpcId\":\"vpc-333\",\"SecurityGroupIds\":[\"sg-444\"]},\"CodeSize\":5555,\"Description\":\"Lambda\",\"Runtime\":\"java11\",\"Version\":\"$LATEST\"}"
| spath

 

 

 

[anony]

INGEST_EVAL = Environment.Variables.ENVIRONMENT:=md5(Environment.Variables.ENVIRONMENT),Environment.Variables.USER:=md5(Environment.Variables.USER),Environment.Variables.USERPASSD:=md5(Environment.Variables.USERPASSD)
WRITE_META = true

[anony_raw]
REGEX = (?m)(.*Environment\":{)(.*?})(.*)
FORMAT = $1$3
DEST_KEY = _raw

 

0 Karma

mah
Contributor

OK thanks, but I always have the same starting problem: all the key: value present in Variables {} are not the same in each event my starting question is: how I put your solution in place for the key: value which change all the time and that I cannot know in advance ?

I can not set up your solution because you wrote specific keys.

Example of new event :

{"Description":  None, "LastModified": "2019-12-05T10:58:05.308+0000", "TracingConfig": {"Mode": "PassThrough"}, "Version": "$LATEST", "CodeSize": 1909, "Handler": "handler", "RevisionId": "111", "MemorySize": 128, "Timeout": 180, "Environment": {"Variables": {"MAILING_LIST": "xxx@xxx.com", "PARAM_NAME": "toto", "NAME": "titi", "FLAG_NAME": "OK_flag", "ENVIRONMENT": "test", "SECRET_NAME": "123_cred", "REGION": "eu-east-1", "MAILING_LIST_PARAM_NAME": "/walnut/mailing_list"}}, "region": "eu-east-1", "Runtime": "python3.6"}

0 Karma

rnowitzki
Builder

Hi @mah,

I created a RegEx that might work.  I could not find a way to make it super-dynamic, but if you know the max number of key:value pairs in the data, it should work:

 

(?<="Variables"\:\s\\\{)(?>\"\w+\"\:\s\"(\S+)\"\,\s)?(?>\"\w+\"\:\s\"(\S+)\"\,\s)?(?>\"\w+\"\:\s\"(\S+)\"\,\s)?(?>\"\w+\"\:\s\"(\S+)\"\,\s)?

 

This example works if you have 4 or less Key/Value pairs. They will be assigned to group1, group2 etc.

If you expect more than 4, you have to append more of these:

(?>\"\w+\"\:\s\"(\S+)\"\,\s)?


Note: This does work with the format where the values are in quotes, in your initial post you had an example where the values were not within quotes.


Reg101 link

 

--
Karma and/or Solution tagging appreciated.
0 Karma

mah
Contributor

Hi @rnowitzki , thank you for your reply. 

I tried to apply your regex but the problem is that some keys are written with several words like :

"USER PASSWORD": "12345abcd"

And your regex does not work anymore... 

0 Karma

rnowitzki
Builder

Hi @mah,

Ok, then we have to add optional space and optional second word to the regex.

(?<="Variables"\:\s\\\{)(?>\"\w+?\s?\w+\"\:\s\"(\S+)\"\,\s)?(?>\"\w+?\s?\w+\"\:\s\"(\S+)\"\,\s)?(?>\"\w+?\s?\w+\"\:\s\"(\S+)\"\,\s)?(?>\"\w+?\s?\w+\"\:\s\"(\S+)\"\,\s)?


The question marks before \s and \w+ mean: there might or might not be a space and another word after the first word....


The example shown above works with max 4 key value pairs. If you expect more, append more of these:

(?>\"\w+?\s?\w+\"\:\s\"(\S+)\"\,\s)?

 
Hope this works better.

--
Karma and/or Solution tagging appreciated.
0 Karma

mah
Contributor

Hi, @rnowitzki 

It works but with this regex :

(?>\"\S+?\s?\S+\"\:\s\"(\S+)\"\,\s)?

The second point is in the first part of the regex is :

(?<="Variables"\:\s\\\{)

 but it does not close the brace symbol so the issue is that I want to apply this ONLY for Key/Value in the section Variables. 

What I want to say is for example, if I put 10 times the key/value regex, while I have a section Variables with only 2 key/value pair, it will mask the data OUTSIDE the section Variables.

I find a regex that extract this section, but I don't know how to include your solution above :

 

\{\"Variables\"\:\s*\\\{(?<Variables>[^\}]+)\\

 

Do you hav an idea ? 

0 Karma

rnowitzki
Builder

I don't get the issue on reg101 (it does not select anything outside of the "variables" section).

But I updated it,  by appending  the closing 2 brackets at the end.

https://regex101.com/r/eXmyO3/6

 

(?<="Variables"\:\s\\\{)(?>\"\S+?\s?\S+\"\:\s\"(\S+)\"\,\s)?(?>\"\S+?\s?\S+\"\:\s\"(\S+)\"\,\s)?(?>\"\S+?\s?\S+\"\:\s\"(\S+)\"\,\s)?(?>\"\S+?\s?\S+\"\:\s\"(\S+)\"\,\s)\\\}\\\}

 

Getting closer - i hope 🙂

--
Karma and/or Solution tagging appreciated.
0 Karma

mah
Contributor

That great ! My case is solved ! 

With the last regex I tried to put more regex key/value that the Variables section contains, and it doesn't go outside the section. 

https://regex101.com/r/eXmyO3/7

I think I can put SEDCMD parameter into props.conf and not use a transforms.conf in addition. 

Nice job Thanks a lot ! 

0 Karma

rnowitzki
Builder

@mah you're welcome. I learned something myself 🙂

You might want to set my last reply as the solution, for future splunkers having similiar issues.

Cheers
Ralph

--
Karma and/or Solution tagging appreciated.

View solution in original post

0 Karma

to4kawa
SplunkTrust
SplunkTrust

 

[anony_raw1]
REGEX = (?m)(.*ENVIRONMENT\":\")([^\"]+)(\".*)
FORMAT = $1$3
DEST_KEY = _raw

[anony_raw2]
REGEX = (?m)(.*USER\":\")([^\"]+)(\".*)
FORMAT = $1$3
DEST_KEY = _raw

[anony_raw3]
REGEX = (?m)(.*USERPASSD\":\")([^\"]+)(\".*)
FORMAT = $1$3
DEST_KEY = _raw

 

There should need three anonymize stanza,I guess.

0 Karma
Take the 2021 Splunk Career Survey

Help us learn about how Splunk has
impacted your career by taking the 2021 Splunk Career Survey.

Earn $50 in Amazon cash!