Splunk Enterprise

MATCH_LIMIT in tranforms.conf

jlvix1
Communicator

I have a fairly hefty chunk of JSON from RabbitMQ REST.

In my props I have:

[json_no_timestamp]
TRUNCATE = 500000

In transforms, I have:

[CFBPFCCmessages]
REGEX = (?U)()"messages":(?P<CFBPFCCmessages>\d+)
WRITE_META = true
FORMAT = CFBPFCCmessages::$2

[CFBPFfailed]
REGEX = (?U)()"messages":.+"messages":(?P<CFBPFfailed>\d+),"messages
WRITE_META = true
FORMAT = CFBPFfailed::$2

[CFBPFmobile]
REGEX = (?U)()"messages":.+"messages":.+"messages":(?P<CFBPFmobile>\d+),"messages
WRITE_META = true
FORMAT = CFBPFmobile::$2

[CFBPFonboard]
REGEX = (?U)()"messages":.+"messages":.+"messages":.+"messages":(?P<CFBPFonboard>\d+),"messages
WRITE_META = true
FORMAT = CFBPFonboard::$2

[CFBPFticketoffice]
REGEX = (?U)()"messages":.+"messages":.+"messages":.+"messages":.+"messages":(?P<CFBPFticketoffice>\d+),"messages
WRITE_META = true
FORMAT = CFBPFticketoffice::$2

[CFBPFtvm]
REGEX = (?U)()"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":(?P<CFBPFtvm>\d+),"messages
WRITE_META = true
FORMAT = CFBPFtvm::$2

[CFBPFunknown]
REGEX = (?U)()"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":(?P<CFBPFunknown>\d+),"messages
WRITE_META = true
FORMAT = CFBPFunknown::$2

[CFBPFweb]
REGEX = (?U)()"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":(?P<CFBPFweb>\d+),"messages
WRITE_META = true
FORMAT = CFBPFweb::$2

[CFBPMemail]
REGEX = (?U)()"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":(?P<CFBPMemail>\d+),"messages
WRITE_META = true
FORMAT = CFBPMemail::$2

[CFBPMfailed]
REGEX = (?U)()"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":(?P<CFBPMfailed>\d+),"messages
WRITE_META = true
FORMAT = CFBPMfailed::$2

[CFBPMsms]
REGEX = (?U)()"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":(?P<CFBPMsms>\d+),"messages
WRITE_META = true
FORMAT = CFBPMsms::$2

[CFBPMunknown]
REGEX = (?U)()"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":(?P<CFBPMunknown>\d+),"messages
WRITE_META = true
FORMAT = CFBPMunknown::$2

[CFGPFCCmessages]
REGEX = (?U)()"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":(?P<CFGPFCCmessages>\d+)
WRITE_META = true
FORMAT = CFGPFCCmessages::$2

[CFGPFfailed]
REGEX = (?U)()"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":(?P<CFGPFfailed>\d+),"messages
WRITE_META = true
FORMAT = CFGPFfailed::$2

[CFGPFmobile]
REGEX = (?U)()"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":(?P<CFGPFmobile>\d+),"messages
WRITE_META = true
FORMAT = CFGPFmobile::$2

[CFGPFonboard]
REGEX = (?U)()"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":(?P<CFGPFonboard>\d+),"messages
WRITE_META = true
FORMAT = CFGPFonboard::$2

[CFGPFticketoffice]
REGEX = (?U)()"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":(?P<CFGPFticketoffice>\d+),"messages
WRITE_META = true
FORMAT = CFGPFticketoffice::$2

[CFGPFtvm]
REGEX = (?U)()"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":(?P<CFGPFtvm>\d+),"messages
WRITE_META = true
FORMAT = CFGPFtvm::$2

[CFGPFunknown]
REGEX = (?U)()"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":(?P<CFGPFunknown>\d+),"messages
WRITE_META = true
FORMAT = CFGPFunknown::$2

[CFGPFweb]
REGEX = (?U)()"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":(?P<CFGPFweb>\d+),"messages
WRITE_META = true
FORMAT = CFGPFweb::$2

[CFGPMemail]
REGEX = (?U)()"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":(?P<CFGPMemail>\d+),"messages
WRITE_META = true
FORMAT = CFGPMemail::$2

[CFGPMfailed]
REGEX = (?U)()"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":(?P<CFGPMfailed>\d+),"messages
WRITE_META = true
FORMAT = CFGPMfailed::$2

[CFGPMsms]
REGEX = (?U)()"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":(?P<CFGPMsms>\d+),"messages
WRITE_META = true
FORMAT = CFGPMsms::$2

[CFGPMunknown]
REGEX = (?U)()"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":(?P<CFGPMunknown>\d+),"messages
WRITE_META = true
FORMAT = CFGPMunknown::$2

When indexing, I only get the first 3 fields, the other fields beyond CFBPFmobile are not indexed.

I was considering MATCH_LIMIT, will this work?

0 Karma
1 Solution

jlvix1
Communicator

Hi all, solved this with a major deep dive.

The RegEx provided is a great improvement, thanks for that, it still only works with the ungreedy prefix as well.

The missing point was LOOKAHEAD - default is 4KB and this is the directive to regex to not go beyond that value by default.

Now, each stanza looks like this:
[CFGPFweb]
REGEX = (?U)"messages":(.+"messages":){19}(?P\d+),"messages
LOOKAHEAD = 65535
WRITE_META = true
FORMAT = CFGPFweb::$2

And it works, I had to update every definition to 64KB, not sure how much overhead but I'm only getting 1 JSON msg per/min.

View solution in original post

jlvix1
Communicator

Hi all, solved this with a major deep dive.

The RegEx provided is a great improvement, thanks for that, it still only works with the ungreedy prefix as well.

The missing point was LOOKAHEAD - default is 4KB and this is the directive to regex to not go beyond that value by default.

Now, each stanza looks like this:
[CFGPFweb]
REGEX = (?U)"messages":(.+"messages":){19}(?P\d+),"messages
LOOKAHEAD = 65535
WRITE_META = true
FORMAT = CFGPFweb::$2

And it works, I had to update every definition to 64KB, not sure how much overhead but I'm only getting 1 JSON msg per/min.

nickhills
Ultra Champion

You might want to try this to make your regex a bit cleaner:

 [CFBPFCCmessages]
 REGEX = (?U)()"messages":(?P<CFBPFCCmessages>\d+)
 WRITE_META = true
 FORMAT = CFBPFCCmessages::$2

 [CFBPFfailed]
 REGEX = (?U)()"messages":(.+"messages":){1}(?P<CFBPFfailed>\d+),"messages
 WRITE_META = true
 FORMAT = CFBPFfailed::$2

 [CFBPFmobile]
 REGEX = (?U)()"messages":(.+"messages":){2}(?P<CFBPFmobile>\d+),"messages
 WRITE_META = true
 FORMAT = CFBPFmobile::$2
 ...
 ...

Im not familiar with RabbitMQ, but its possible that because you are not explicitly specifying a string start with ^ you could be getting inconsistent matches.

What is in your event before the first "messages" entry?

If my comment helps, please give it a thumbs up!

jlvix1
Communicator

The first instance of a match failure is reportedly at bytes 4959-4960, this is for the CFBPFonboard field, and the rest after that fail as well.

The performance stats for regex101 say that this is 39232 steps and takes ~73 ms.

Is this operation too expensive for the regex engine?

Thanks

0 Karma

jlvix1
Communicator

Hi, I have tried this and got exactly the same result, I believe this may have something to do with truncation of the event or some sort of limitation with the regex input buffer - although I have set truncate = 500000, this may not be respected from a regex point of view?

0 Karma

jlvix1
Communicator

Before and up to the first occurrence:

[{"memory":21904,"reductions":413518,"reductions_details":{"rate":0.0},"messages":0,"messages_details":

0 Karma

jlvix1
Communicator

It all works on regex101.com using PCRE, but only when I specify the ungreedy option, hence the (?U).

I will try what you have done, however when I did use the {n} regex function on regex101 is just went mad and started selecting 1, 2, 3 characters then nothing, as if it was selecting the amount in characters and not the occurrence.

I can't post the JSON here it's too much, it is very uniform and strongly formatted with no line breaks etc...

0 Karma
Get Updates on the Splunk Community!

Introducing Splunk Enterprise 9.2

WATCH HERE! Watch this Tech Talk to learn about the latest features and enhancements shipped in the new Splunk ...

Adoption of RUM and APM at Splunk

    Unleash the power of Splunk Observability   Watch Now In this can't miss Tech Talk! The Splunk Growth ...

Routing logs with Splunk OTel Collector for Kubernetes

The Splunk Distribution of the OpenTelemetry (OTel) Collector is a product that provides a way to ingest ...