We are routing events to some_index based on the source during parsing.
Part of the source goes to "original_index", which is set in "inputs.conf", and part of them goes to "some_other_index"
props.conf
[source::some_part_of_source]
TRANSFORMS-index_routing = route_to_some_other_index
transforms.conf
[route_to_some_other_index]
REGEX = .
DEST_KEY = _MetaData:Index
FORMAT = some_other_index
We receive lots of events per second and we are concerned that this transforms is causing the delay in indexing (we are seeing indexing lag).
Now the query I have is:
a) REGEX = .
b) REGEX = (.)
c) REGEX = .*
d) REGEX = .*?
e) REGEX = ^.
Does all of the above REGEX matches mean the same or that any one is better over the other, which could help speed up the transformation and reduce the indexing lag?
If you put one of this REGEX you will redirect all your events from your "source" in some_other_index. If you want to redirect only one part of the source, you need to use some keywords (which is only in events that you want redirect in other index) in your REGEX. The better REGEX to match "all" with only one match >> .* and without any group
-- if this transforms is causing the delay in indexing..
I doubt that the regex can make the difference - I would check standard delay causes...
You can put one of your sample log in https://regex101.com/ and test which regex runs faster and with minimum number of steps. From your above 4, I would try REGEX = ^.
as well.
@somesoni2 I am afraid ^. does not MATCH ALL in https://regex101.com
Given the combined list:
I'd expect that 1, and 5 will be very similar, and the best choices. 2 requires the regex engine to create a capture group, which you don't appear to need. 3, depending on the efficiency of the regex engine, may decide to consider all the characters in the event. 4 should reduce to 1, but the regex engine will have to take that extra step.
What does your inputs.conf entry looks like for this? Best scenario here would be that you split the input stanza for this source from original and then assign index at inputs.conf (on forwarder) level, completely avoiding index-time processing of routing to different index.
Inputs are from Google Pubsub Queue, hence I would not be able to assign both the original index and some index from the inputs.conf.
Any specific reason to separate them out by indexes?