Splunk Transforms REGEX Wildcard Help

Venkat_16 · ‎09-14-2018

We are routing events to some_index based on the source during parsing.

Part of the source goes to "original_index", which is set in "inputs.conf", and part of them goes to "some_other_index"

props.conf
    [source::some_part_of_source]
    TRANSFORMS-index_routing = route_to_some_other_index

transforms.conf
    [route_to_some_other_index]
    REGEX = .
    DEST_KEY = _MetaData:Index
    FORMAT = some_other_index

We receive lots of events per second and we are concerned that this transforms is causing the delay in indexing (we are seeing indexing lag).

Now the query I have is:

a) REGEX = .
b) REGEX = (.)
c) REGEX = .*
d) REGEX = .*?
e) REGEX = ^.

Does all of the above REGEX matches mean the same or that any one is better over the other, which could help speed up the transformation and reduce the indexing lag?

desax · ‎09-14-2018

If you put one of this REGEX you will redirect all your events from your "source" in some_other_index. If you want to redirect only one part of the source, you need to use some keywords (which is only in events that you want redirect in other index) in your REGEX. The better REGEX to match "all" with only one match >> .* and without any group

ddrillic · ‎09-14-2018

-- if this transforms is causing the delay in indexing..
I doubt that the regex can make the difference - I would check standard delay causes...

somesoni2 · ‎09-14-2018

You can put one of your sample log in https://regex101.com/ and test which regex runs faster and with minimum number of steps. From your above 4, I would try REGEX = ^. as well.

Venkat_16 · ‎09-14-2018

@somesoni2 I am afraid ^. does not MATCH ALL in https://regex101.com

LearninStuff · ‎09-14-2018

Given the combined list:

REGEX = .
REGEX = (.)
REGEX = .*
REGEX = .*?
REGEX = ^.

I'd expect that 1, and 5 will be very similar, and the best choices. 2 requires the regex engine to create a capture group, which you don't appear to need. 3, depending on the efficiency of the regex engine, may decide to consider all the characters in the event. 4 should reduce to 1, but the regex engine will have to take that extra step.

somesoni2 · ‎09-14-2018

What does your inputs.conf entry looks like for this? Best scenario here would be that you split the input stanza for this source from original and then assign index at inputs.conf (on forwarder) level, completely avoiding index-time processing of routing to different index.

Venkat_16 · ‎09-14-2018

Inputs are from Google Pubsub Queue, hence I would not be able to assign both the original index and some index from the inputs.conf.

somesoni2 · ‎09-14-2018

Any specific reason to separate them out by indexes?

Splunk Transforms REGEX Wildcard Help

What's New in Splunk Enterprise 9.4: Features to Power Your Digital Resilience

Take Your Breath Away with Splunk Risk-Based Alerting (RBA)

SignalFlow: What? Why? How?