Splunk Search

Splunk Transforms REGEX Wildcard Help

Venkat_16
Contributor

We are routing events to some_index based on the source during parsing.

Part of the source goes to "original_index", which is set in "inputs.conf", and part of them goes to "some_other_index"

props.conf
    [source::some_part_of_source]
    TRANSFORMS-index_routing = route_to_some_other_index

transforms.conf
    [route_to_some_other_index]
    REGEX = .
    DEST_KEY = _MetaData:Index
    FORMAT = some_other_index

We receive lots of events per second and we are concerned that this transforms is causing the delay in indexing (we are seeing indexing lag).

Now the query I have is:

a) REGEX = .
b) REGEX = (.)
c) REGEX = .*
d) REGEX = .*?
e) REGEX = ^.

Does all of the above REGEX matches mean the same or that any one is better over the other, which could help speed up the transformation and reduce the indexing lag?

desax
Engager

If you put one of this REGEX you will redirect all your events from your "source" in some_other_index. If you want to redirect only one part of the source, you need to use some keywords (which is only in events that you want redirect in other index) in your REGEX. The better REGEX to match "all" with only one match >> .* and without any group

0 Karma

ddrillic
Ultra Champion

-- if this transforms is causing the delay in indexing..
I doubt that the regex can make the difference - I would check standard delay causes...

0 Karma

somesoni2
Revered Legend

You can put one of your sample log in https://regex101.com/ and test which regex runs faster and with minimum number of steps. From your above 4, I would try REGEX = ^. as well.

0 Karma

Venkat_16
Contributor

@somesoni2 I am afraid ^. does not MATCH ALL in https://regex101.com

0 Karma

LearninStuff
Observer

Given the combined list:

  1. REGEX = .
  2. REGEX = (.)
  3. REGEX = .*
  4. REGEX = .*?
  5. REGEX = ^.

I'd expect that 1, and 5 will be very similar, and the best choices. 2 requires the regex engine to create a capture group, which you don't appear to need. 3, depending on the efficiency of the regex engine, may decide to consider all the characters in the event. 4 should reduce to 1, but the regex engine will have to take that extra step.

0 Karma

somesoni2
Revered Legend

What does your inputs.conf entry looks like for this? Best scenario here would be that you split the input stanza for this source from original and then assign index at inputs.conf (on forwarder) level, completely avoiding index-time processing of routing to different index.

0 Karma

Venkat_16
Contributor

Inputs are from Google Pubsub Queue, hence I would not be able to assign both the original index and some index from the inputs.conf.

0 Karma

somesoni2
Revered Legend

Any specific reason to separate them out by indexes?

0 Karma