I have a csv data that contains some sensitive information like client ip. Here is how one of the rows of the data looks:
David, London,...several more columns...,192.168.0.1
What I want is to mask the IP replacing it with the string "XXXXXXX" so that it produces, for the above row:
David, London, ...several more columns..., XXXXXXX
Also, this operation needs to be performed at index-time.
I have tried setting up transforms in prop.conf and transform.conf:
[source::data.csv]
TRANSFORMS-masking = pii-mask
[pii-mask]
REGEX = .*
FORMAT = ClientIP::XXXXXX
SOURCE_KEY = ClientIP
DEST_KEY = ClientIP
However, even after doing this, the IP still comes up. Can anybody tell me how to fix this issue?
It seems to me that the fields have not been extracted when the transforms are run. If this is the case, how should I get extraction done before transformation?
Edit:
One of the columns in the data is address. This field can contain arbitrary number of commas, for example: "#221, Baker Street, London, England". So, I can't use a simple regular expression in sed. Instead, what I want to know is how to do transforms on extracted field rather than on the _raw field.
... View more