I am trying to remove specific strings and their values from Splunk events at index time as they are not needed in the event that is being indexed.
eg. 08-09-2016 12:59:25 {"menu":{"id":"file","value":"File","popup":{"menuitem":[{"value":"New","onclick":"CreateNewDoc()"},{"value":"Open","onclick":"OpenDoc()"},{"value":"Close","onclick":"CloseDoc()"}]}}}
For example, from this event I would like to remove the "onclick" key and value.
I have created an entry in the props.conf for a transform to be performed for the sourcetype, and in the transforms.conf, I have configured the following:
[remove_onclick]
REGEX = ^(.)\,\"onclick\":\"[^\"]+\"(.)$
FORMAT = $1$2
DEST_KEY = _raw
The aim is to get everything before the "onclick" string, then get everything after it, and format the event to concatenate these together.
When the event is indexed, the strings are removed correctly, however when the event string is large (over 4096 characters in length), Splunk is truncating the string to 4096 characters when performing the regex. So the result event is chopped at the end, and the remaining event string data is lost.
I have tried indexing the event without any transformation being performed and the event is indexed entirely without any string truncation.
Is there any configuration value that needs to be set to avoid this, or is there another approach I can take to remove specific strings at index time from an event?
Thanks!
Have you looked at SEDCMD? Something like this should work (please verify regex)
SEDCMD-remove_class = s/(\"onclick[^\}]+)//g
Thanks sundareshr, that solution will work perfectly as well!
no answer is selected, if no transforms, this is the best way to handle this case,
Within the transforms.conf, the setting for LOOKAHEAD is default to 4096, so this was what I had to increase for the regex to completely work.
it's possible that splunk applies the truncate parameter in props.conf
#******************************************************************************
# Line breaking
#******************************************************************************Line breaking
# Use the following attributes to define the length of a line.
TRUNCATE = <non-negative integer>
* Change the default maximum line length (in bytes).
* Although this is in bytes, line length is rounded down when this would
otherwise land mid-character for multi-byte characters.
* Set to 0 if you never want truncation (very long lines are, however, often
a sign of garbage data).
* Defaults to 10000 bytes.
You might need to set this value to a higher number for this particular sourcetype. Also make sure that you have the same settings in indexer and HF if you have an HF in between.
Thanks for the reply, I hadn't changed the TRUNCATE value for this sourcetype, so it still had the default value of 10000 bytes.
After further investigation, I have found the solution.
Within the transforms.conf, the setting for LOOKAHEAD is default to 4096, so this was what I had to increase for the regex to completely work.
can someone help on this ticket - https://community.splunk.com/t5/Getting-Data-In/Exclude-or-Remove-few-fields-while-on-boarding-data/...