Getting Data In

Anonymize data from JSON File

Path Finder

I have a json event with an id which I want to anonymize. However, I have to be able to perform stats/count/grouping and other analytics on this id later. In short, I want to hide this id for the users but should be able to be used internally by Splunk. Is this possible?

My event looks something like this:

{"duration":0.33,"a":"login","i":"50050","d":"2055502349","c":"LIVE","@timestamp":"2020-05-22T01:59:59.601Z"}

I want to anonymize "d" id.

Labels (1)
0 Karma

Ultra Champion

UPDATED:

props.conf

[anony_json]
INDEXED_EXTRACTIONS = json
KV_MODE = none
TRANSFORMS-anony = anony, anony_raw
TRUNCATE = 0
TIME_PREFIX = timestamp\":\"
SHOULD_LINEMERGE = false

transforms.conf

[anony]
INGEST_EVAL = d:=md5(d)
WRITE_META = true

[anony_raw]
REGEX = (?m)(.*\"d\":\s*\"\d{4})\d+\"(.*)
FORMAT = $1XXXXXX"$2
DEST_KEY =_raw

https://docs.splunk.com/Documentation/Splunk/latest/Data/Anonymizedata
https://docs.splunk.com/Documentation/Splunk/latest/Data/IngestEval

In my splunk(ver 8), this setting works.
I have a few mistakes. I fix them.

How about this?

0 Karma

Path Finder

Thank you for your answer.

d is single valued.

However, I cannot use this solution as I would not be able to perform commands like "|stats count by d" since the indexed value of d will be changed. I want d to be anonymized for all the users but splunk should be able to internally use it.

0 Karma

Ultra Champion
 [anony]
 INGEST_EVAL = d=md5(d)
 WRITE_META = true
 [anony_raw]
 REGEX = (\"d\":\s*\")(\d{4})\d+\"
 FORMAT = $1$2XXXXXX"
 DEST_KEY = _raw

use hash

0 Karma

Path Finder

I exactly want this. I changed anony_raw so as to include data before and after. However, the hash is not applied. The script only adds XXX to d instead of calculating hash.

props.conf

INDEXED_EXTRACTION = json 
KV_MODE = none 
TRANSFORMS-anony = anony, anony_raw

Transforms.conf

[anony] 
INGEST_EVAL = d=md5(d)
WRITE_META = true

[anony_raw] 
REGEX = (?m)^(.*)(\"d\":\s*\")(\d{4})\d+\"(.*)
FORMAT = $1$2$3XXXXXX"$4 
DEST_KEY =_raw
0 Karma

Path Finder
0 Karma

Ultra Champion

INGEST_EVAL = d=substr(d,5,10).substr(d,1,6).(d%2).(d%3)
How's this?

0 Karma

Path Finder

This does not work. anony_raw overrides anony so the end result is d: 2055XXXXXX. I want to use md5 so that I can still co-relate data-.

For props.conf even if I change order of the two properties the end result stays the same. Removing anony_raw makes no changes to the original information.

0 Karma

Ultra Champion

My answer is updated. please confirm.

0 Karma

Path Finder

Hi,

I have the same issue BUT little more complex. 

This is an example of a json event return in splunk :

{ [-]
   CodeSha2562+1ndsvhz23R2VD42
   CodeSize1909
   Description: None
   Environment: { [-]
     Variables: { [-]
       CLUSTER_NAME: Cluster
       ENVIRONMENTdev
       USER_NAMEtata
       PASSWDtoto!
     }

   }

   LastModified2019-12-05T10:58:05.308+0000
   MemorySize128
   RevisionIdf0d723sdf6-c000edfzf
   Runtimepython3.6
   Timeout180
   TracingConfig: { [+]
   }

   Version$LATEST
   regioneu-east-1

}

The problem is that sensitive data appear in clear specifically  in Environment>Variables

In this section, we have variables : we can not create a regex with specific key name because it always changes. 

How can I mask all values in the Environment>Variables WITHOUT masking the key ?

Example of result I want :

[-]
   CodeSha2562+1ndsvhz23R2VD42
   CodeSize1909
   Description: None
   Environment: { [-]
     Variables: { [-]
       CLUSTER_NAME:
       ENVIRONMENT:
       USER_NAME:
       PASSWD:
     }

   }

   LastModified2019-12-05T10:58:05.308+0000
   MemorySize128
   RevisionIdf0d723sdf6-c000edfzf
   Runtimepython3.6
   Timeout180
   TracingConfig: { [+]
   }

   Version$LATEST
   regioneu-east-1
}

Tags (1)
0 Karma

Builder

Hello @AnujaJ

Though I haven't tried this yet, I think this can be achieved by forwarding the anonymized event at index-time to the intended customer index and forward a separate non-anonymized event on an admin-only index.

Caveat for this is it would double your license usage.

Please see link below if my answer is what you're aiming for:
https://answers.splunk.com/answers/690291/one-source-to-two-indexes.html

EDIT:

You can actually achieve the "one data source (anonymized and non-anonymized) to two indexes solution" without hitting a double license usage:
(check woodcock's answer on the link below)
https://answers.splunk.com/answers/567223/how-to-send-same-data-source-to-two-or-multiple-in-1.html

Hope it helps!

0 Karma

Path Finder

Since the actual data is only available to the admin, does it mean that only admin will create the dashboards while other users use customer index?

0 Karma