Splunk Search

Modify Field with Regex at Index Time

dbuehler
Loves-to-Learn Everything

Hey guys,

 

I have IIS logs that are logging multiple IPs to the X-Forwarded-For field as below: 

 

114.119.136.78,+162.158.119.25
 

 

 

I would like to apply a regex to the X-Forwarded-For field at index time to ensure the field only contains the first IP, like:

 

114.119.136.78

 

 

In other words, anything after the first comma should be cut out of the field.

 

So far I have tried to achieve this with the following props/transforms:

 

#props
[iis]
TRANSFORMS-rm-extra-ips = rm_extra_ips

#transforms
[rm_extra_ips]
SOURCE_KEY = field:X_Forwarded_For
REGEX = ^(.+?),

 

 

How do I do this?

Thanks!

Labels (2)
0 Karma

bowesmana
SplunkTrust
SplunkTrust
0 Karma

dbuehler
Loves-to-Learn Everything

Thanks, I was looking into SEDCMD originally, I've used it for other purposes before.

Can SEDCMD operate on just one field? Or does it have to operate on the entire event (_raw)?

 

0 Karma

isoutamo
SplunkTrust
SplunkTrust

Hi

Based on props.conf's spec no, if must operate towards _raw.

SEDCMD-<class> = <sed script>
* Only used at index time.
* Commonly used to anonymize incoming data at index time, such as credit
  card or social security numbers. For more information, search the online
  documentation for "anonymize data."
* Used to specify a sed script which Splunk software applies to the _raw
  field.
* A sed script is a space-separated list of sed commands. Currently the
  following subset of sed commands is supported:
    * replace (s) and character substitution (y).
* Syntax:
    * replace - s/regex/replacement/flags
      * regex is a perl regular expression (optionally containing capturing
        groups).
      * replacement is a string to replace the regex match. Use \n for back
        references, where "n" is a single digit.
      * flags can be either: g to replace all matches, or a number to
        replace a specified match.
    * substitute - y/string1/string2/
      * substitutes the string1[i] with string2[i]
* No default.

r. Ismo 

0 Karma

dbuehler
Loves-to-Learn Everything

For some reason my props.conf config isn't being applied to my data.

I've found that this is the regex I need:

s/,+[.0-9\:a-z]*//g

And this regex works perfectly when run manually, i.e. using a sed command against a text file with a sample event:

cat sample.txt | sed 's/,+[.0-9\:a-z]*//g'

My props.conf (placed on my two indexers in /opt/splunk/etc/apps/my-iis-app/local) is configured to apply to the 'iis' sourcetype, which is correct, and looks like:

[iis]

SEDCMD-remove-extra-ips = s/,+[.0-9\:a-z]*//g

After restarting Splunk, the events are coming in un-modified. It appears the regex isn't being applied at all, as even if I change my config to a very simple test regex, that doesn't work either, e.g.:

[iis]
SEDCMD-test = s/10/test/g

Any ideas?

0 Karma

isoutamo
SplunkTrust
SplunkTrust
Shoul you have \. Instead of . In your shed expression? First match to . second one match to every character.
0 Karma

dbuehler
Loves-to-Learn Everything

That regex works just the same in manual tests, but does not work when applied as a SEDCMD in props.conf.

It seems clear my props SEDCMD is not being applied to my data at all.

If I run:

/opt/splunk/bin/splunk btool --debug props list |grep SEDCMD

I see my setting showing up in the output. Is there any other way to troubleshoot if my SEDCMD is being applied?

0 Karma
Get Updates on the Splunk Community!

Extending Observability Content to Splunk Cloud

Watch Now!   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to leverage ...

More Control Over Your Monitoring Costs with Archived Metrics!

What if there was a way you could keep all the metrics data you need while saving on storage costs?This is now ...

New in Observability Cloud - Explicit Bucket Histograms

Splunk introduces native support for histograms as a metric data type within Observability Cloud with Explicit ...