Splunk Search

Remove specific string from event at index time

benUnicoSplunk
New Member

I am trying to remove specific strings and their values from Splunk events at index time as they are not needed in the event that is being indexed.
eg. 08-09-2016 12:59:25 {"menu":{"id":"file","value":"File","popup":{"menuitem":[{"value":"New","onclick":"CreateNewDoc()"},{"value":"Open","onclick":"OpenDoc()"},{"value":"Close","onclick":"CloseDoc()"}]}}}

For example, from this event I would like to remove the "onclick" key and value.
I have created an entry in the props.conf for a transform to be performed for the sourcetype, and in the transforms.conf, I have configured the following:
[remove_onclick]
REGEX = ^(.)\,\"onclick\":\"[^\"]+\"(.)$
FORMAT = $1$2
DEST_KEY = _raw

The aim is to get everything before the "onclick" string, then get everything after it, and format the event to concatenate these together.

When the event is indexed, the strings are removed correctly, however when the event string is large (over 4096 characters in length), Splunk is truncating the string to 4096 characters when performing the regex. So the result event is chopped at the end, and the remaining event string data is lost.
I have tried indexing the event without any transformation being performed and the event is indexed entirely without any string truncation.

Is there any configuration value that needs to be set to avoid this, or is there another approach I can take to remove specific strings at index time from an event?

Thanks!

0 Karma

sundareshr
Legend

Have you looked at SEDCMD? Something like this should work (please verify regex)

SEDCMD-remove_class = s/(\"onclick[^\}]+)//g

http://docs.splunk.com/Documentation/Splunk/6.4.3/Data/Anonymizedata#Anonymize_data_through_a_sed_sc...

0 Karma

benUnicoSplunk
New Member

Thanks sundareshr, that solution will work perfectly as well!

0 Karma

akocak
Contributor

no answer is selected, if no transforms, this is the best way to handle this case,

0 Karma

benUnicoSplunk
New Member

Within the transforms.conf, the setting for LOOKAHEAD is default to 4096, so this was what I had to increase for the regex to completely work.

0 Karma

renjith_nair
SplunkTrust
SplunkTrust

it's possible that splunk applies the truncate parameter in props.conf

#******************************************************************************
# Line breaking
#******************************************************************************Line breaking

# Use the following attributes to define the length of a line.

TRUNCATE = <non-negative integer>
* Change the default maximum line length (in bytes).
* Although this is in bytes, line length is rounded down when this would
  otherwise land mid-character for multi-byte characters.
* Set to 0 if you never want truncation (very long lines are, however, often
  a sign of garbage data).
* Defaults to 10000 bytes.

You might need to set this value to a higher number for this particular sourcetype. Also make sure that you have the same settings in indexer and HF if you have an HF in between.

Happy Splunking!
0 Karma

benUnicoSplunk
New Member

Thanks for the reply, I hadn't changed the TRUNCATE value for this sourcetype, so it still had the default value of 10000 bytes.

After further investigation, I have found the solution.
Within the transforms.conf, the setting for LOOKAHEAD is default to 4096, so this was what I had to increase for the regex to completely work.

0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...