We are an index in which most of the fields have a single quote at the beginning and end of the values. We would like to strip them, first at search time and hopefully later at index time.
How can we do that?
Like this for search-time:
... | rex field=YourFieldNameHere mode=sed "s/'|'$//g"
Like this for index-time:
In props.conf
[YourSourcetypHere]
TRANSFORMS-strip-bounding-quotes = sbq_field_foo, ..., sbq_field_bar
In transforms.conf (multiple similar stanzas):
[sbq_field_foo]
SOURCE_KEY = field_foo
REGEX = '(?<field_foo>[^\']*)\'"
FORMAT = field_foo::$1
WRITE_META = true
This seems to remove all quotes, not just the bounding ones:
... | rex field=YourFieldNameHere mode=sed "s/'|'$//g"
Seems `^` symbol needs to be added?
"s/^'|'$//g"
Thank you @woodcock.
Hi danielbb,
you could create a calculated fields with a regex like this
| rex field=your_field "\'(?<your_new_field>[^\']*)\'"
Bye.
Giuseppe
And index time @gcusello ?
An eval command that trims that from the field will do that:
| eval fieldname=trim(fieldname, "'")
Sorry you meant at index time. It's easy to do with a calculated field, but that doesn't answer your question so I'm converting this to a comment instead of an answer.
An eval command that trims that from the field will do that:
| eval fieldname=trim(fieldname, "'")
Sorry you meant at index time. It's easy to do with a calculated field, but that doesn't answer your question so I'm converting this to a comment instead of an answer.
If it's multiple fields you could also create a macro that cleans them all with foreach
| foreach *
[eval "<<FIELD>>"=trim('<<FIELD>>', "'")]
Really nice - how do we do it at index time?
Sorry I missed this. I am not sure how to accomplish this at index time unfortunately. You could hypothetically strip them at index time from the raw event using this process: https://docs.splunk.com/Documentation/Splunk/latest/Data/Anonymizedata#Anonymize_data_with_a_sed_scr...
I highly caution you though that you MUST write a regex sed statement that only removes the single quotes you don't want and leaves the ones you do. If you nuke all of them, you may have some unintended consequences. Do you have some anonymized _raw examples of the data that is feeding in with the single quote wrapped data? I could take a hack at a sed statement that is hopefully very targeted.