Splunk Search

Modify Field with Regex at Index Time

dbuehler
Loves-to-Learn Everything

Hey guys,

 

I have IIS logs that are logging multiple IPs to the X-Forwarded-For field as below: 

 

114.119.136.78,+162.158.119.25
 

 

 

I would like to apply a regex to the X-Forwarded-For field at index time to ensure the field only contains the first IP, like:

 

114.119.136.78

 

 

In other words, anything after the first comma should be cut out of the field.

 

So far I have tried to achieve this with the following props/transforms:

 

#props
[iis]
TRANSFORMS-rm-extra-ips = rm_extra_ips

#transforms
[rm_extra_ips]
SOURCE_KEY = field:X_Forwarded_For
REGEX = ^(.+?),

 

 

How do I do this?

Thanks!

Labels (2)
0 Karma

bowesmana
SplunkTrust
SplunkTrust
0 Karma

dbuehler
Loves-to-Learn Everything

Thanks, I was looking into SEDCMD originally, I've used it for other purposes before.

Can SEDCMD operate on just one field? Or does it have to operate on the entire event (_raw)?

 

0 Karma

isoutamo
SplunkTrust
SplunkTrust

Hi

Based on props.conf's spec no, if must operate towards _raw.

SEDCMD-<class> = <sed script>
* Only used at index time.
* Commonly used to anonymize incoming data at index time, such as credit
  card or social security numbers. For more information, search the online
  documentation for "anonymize data."
* Used to specify a sed script which Splunk software applies to the _raw
  field.
* A sed script is a space-separated list of sed commands. Currently the
  following subset of sed commands is supported:
    * replace (s) and character substitution (y).
* Syntax:
    * replace - s/regex/replacement/flags
      * regex is a perl regular expression (optionally containing capturing
        groups).
      * replacement is a string to replace the regex match. Use \n for back
        references, where "n" is a single digit.
      * flags can be either: g to replace all matches, or a number to
        replace a specified match.
    * substitute - y/string1/string2/
      * substitutes the string1[i] with string2[i]
* No default.

r. Ismo 

0 Karma

dbuehler
Loves-to-Learn Everything

For some reason my props.conf config isn't being applied to my data.

I've found that this is the regex I need:

s/,+[.0-9\:a-z]*//g

And this regex works perfectly when run manually, i.e. using a sed command against a text file with a sample event:

cat sample.txt | sed 's/,+[.0-9\:a-z]*//g'

My props.conf (placed on my two indexers in /opt/splunk/etc/apps/my-iis-app/local) is configured to apply to the 'iis' sourcetype, which is correct, and looks like:

[iis]

SEDCMD-remove-extra-ips = s/,+[.0-9\:a-z]*//g

After restarting Splunk, the events are coming in un-modified. It appears the regex isn't being applied at all, as even if I change my config to a very simple test regex, that doesn't work either, e.g.:

[iis]
SEDCMD-test = s/10/test/g

Any ideas?

0 Karma

isoutamo
SplunkTrust
SplunkTrust
Shoul you have \. Instead of . In your shed expression? First match to . second one match to every character.
0 Karma

dbuehler
Loves-to-Learn Everything

That regex works just the same in manual tests, but does not work when applied as a SEDCMD in props.conf.

It seems clear my props SEDCMD is not being applied to my data at all.

If I run:

/opt/splunk/bin/splunk btool --debug props list |grep SEDCMD

I see my setting showing up in the output. Is there any other way to troubleshoot if my SEDCMD is being applied?

0 Karma
Get Updates on the Splunk Community!

What’s new on Splunk Lantern in August

This month’s Splunk Lantern update gives you the low-down on all of the articles we’ve published over the past ...

Welcome to the Future of Data Search & Exploration

You have more data coming at you than ever before. Over the next five years, the total amount of digital data ...

This Week's Community Digest - Splunk Community Happenings [8.3.22]

Get the latest news and updates from the Splunk Community here! News From Splunk Answers ✍️ Splunk Answers is ...