Getting Data In

Why is my configuration to anonymize data not working for fields named by FIELD_NAMES in props.conf?

Explorer

Hallo,
I am in the need of anonymizing the second column in a tab-separated log file.
I use the method described in "Anonymize Data".

transforms.conf:

[abcdef]
REGEX = ^([^\t]*\t)[^\t]*somePatternToReplace[^\t]*(\t.*)$
FORMAT = $1TestReplacementString$2
DEST_KEY = _raw

props.conf:

[someSourceType]
TRANSFORMS-xyz = abcdef
FIELD_DELIMITER = \t
FIELD_NAMES = "Field1", "Field2", "Field3", ...

The raw data is processed and indexed as expected, i.e. I see "TestReplacementString" in the search for field _raw. However, the field "Field2" still has the original, unanonymized value. Is there a way I can have that value also affected by anonymization?

Adding this to transforms.conf (and including in props.conf) does not make a difference either:

[fieldSpecific]
REGEX = (.*)something.*
FORMAT = $1TestReplaceField
DEST_KEY = Field2

[accepted_keys]
name = Field2

Thanks for your help in advance!

0 Karma
1 Solution

Explorer

I found out that the SED commands are processed sequentially by the order i write them down in props.conf. That way I can provide a default replacement as the last SED command.
My SEDCMD-xyz line in props.conf has about 35,000 characters since I'm checking about 450 conditions. It works fine; indexing data takes significantly longer, but still OK. If someone has a better idea, please let me know.

View solution in original post

Explorer

I have the same problem.
I am indexing CSV files and every field maintains its data even when the _raw field is getting anonymized.

Is there any way to auto extract csv (not using explicit extraction) and at the same time be able to anonymize some of the fields?

0 Karma

Explorer

I found out that the SED commands are processed sequentially by the order i write them down in props.conf. That way I can provide a default replacement as the last SED command.
My SEDCMD-xyz line in props.conf has about 35,000 characters since I'm checking about 450 conditions. It works fine; indexing data takes significantly longer, but still OK. If someone has a better idea, please let me know.

View solution in original post

Explorer

I found out that FIELDNAMES applies during index time, so not surprising anymore that anonymization doesn't work.
I changed field extraction using EXTRACT- command. This allows me to use SEDCMD- command to modify data at indexing time.
The problem is now: i can provide several SED commands at once, but apparently these are executed somehow in parallel. However, what I need is to implement a mapping, i.e. some kind of case-statement: If field
2 value matches regex1, then set this value for field2; if field2 value matches regex2, then set that value for field2, and so on. Particularly, I need a default value for field2 if none of my regular expressions defined in the SED commands matches.

0 Karma

Explorer

I read that FIELD_NAMES is used at search time field extraction. I assume that for search time field extraction, the _raw field is used as "the source data". With _raw having successfully being anonymized, I wonder why I still get the original, unanonymized value when using search.

0 Karma

Explorer

Some buzzwords as a direction for further investigation are also appreciated.

0 Karma