Splunk Search

Applying backreferences followed by numerics within rex

Explorer

I'm trying to do some data massaging on a field "volume" that has values like "91456789", "83234512", "30124231" to substitute them with values like (respectively) "90m", "80m", and "30m". In other words, bucketing these values into 10 million range buckets.

I'm applying the following regular expression in "sed" mode. The problem being that the backreference "\1" doesn't interpolate correctly because it's followed by a "0". If I remove the "0", it works fine (with the exception that the values come out as "9m", "8m", and "3m".

rex field=volume mode=sed "s/^(\d)\d+$/\10m/"

In PHP, I would use something like ${1}0m to escape the backreference followed by a numeric. Which also begs the question, what regular expression engine is used by Splunk?

The following substitution does work in the sense that the backreference is populated in the search results, but I cannot seem to format the resulting string with a "0" adjacent to the backreference.

s/^(\d)\d+$/\1 0m/
Tags (2)

Splunk Employee
Splunk Employee

PCRE is the Splunk regular expression engine. You can use \g{1} to backreference the first capture group.

Note that it is unnecessary and possibly undesireable to modify your values into Splunk at index time (i.e., using SEDCMD). It would be much better in most cases (and certainly more flexible) to just leave it alone and extract it at search time with a REPORT or EXTRACT clause, and if desired, use the "bucket" search command to bucket the data.

0 Karma

Explorer

This doesn't seem to work. The result is a single column for all of the results, rather than split by ten-millions. The column created in Splunk is shown as "\g{1}0m".

0 Karma