Hallo,
I am in the need of anonymizing the second column in a tab-separated log file.
I use the method described in "Anonymize Data".
transforms.conf:
[abcdef]
REGEX = ^([^\t]*\t)[^\t]*somePatternToReplace[^\t]*(\t.*)$
FORMAT = $1TestReplacementString$2
DEST_KEY = _raw
props.conf:
[someSourceType]
TRANSFORMS-xyz = abcdef
FIELD_DELIMITER = \t
FIELD_NAMES = "Field1", "Field2", "Field3", ...
The raw data is processed and indexed as expected, i.e. I see "TestReplacementString" in the search for field _raw. However, the field "Field2" still has the original, unanonymized value. Is there a way I can have that value also affected by anonymization?
Adding this to transforms.conf (and including in props.conf) does not make a difference either:
[fieldSpecific]
REGEX = (.*)something.*
FORMAT = $1TestReplaceField
DEST_KEY = Field2
[accepted_keys]
name = Field2
Thanks for your help in advance!
I found out that the SED commands are processed sequentially by the order i write them down in props.conf. That way I can provide a default replacement as the last SED command.
My SEDCMD-xyz line in props.conf has about 35,000 characters since I'm checking about 450 conditions. It works fine; indexing data takes significantly longer, but still OK. If someone has a better idea, please let me know.
I have the same problem.
I am indexing CSV files and every field maintains its data even when the _raw field is getting anonymized.
Is there any way to auto extract csv (not using explicit extraction) and at the same time be able to anonymize some of the fields?
I found out that the SED commands are processed sequentially by the order i write them down in props.conf. That way I can provide a default replacement as the last SED command.
My SEDCMD-xyz line in props.conf has about 35,000 characters since I'm checking about 450 conditions. It works fine; indexing data takes significantly longer, but still OK. If someone has a better idea, please let me know.
I found out that FIELD_NAMES applies during index time, so not surprising anymore that anonymization doesn't work.
I changed field extraction using EXTRACT- command. This allows me to use SEDCMD- command to modify data at indexing time.
The problem is now: i can provide several SED commands at once, but apparently these are executed somehow in parallel. However, what I need is to implement a mapping, i.e. some kind of case-statement: If field_2 value matches regex_1, then set this value for field_2; if field_2 value matches regex_2, then set that value for field_2, and so on. Particularly, I need a default value for field_2 if none of my regular expressions defined in the SED commands matches.
I read that FIELD_NAMES is used at search time field extraction. I assume that for search time field extraction, the _raw field is used as "the source data". With _raw having successfully being anonymized, I wonder why I still get the original, unanonymized value when using search.
Some buzzwords as a direction for further investigation are also appreciated.