Splunk Enterprise

MATCH_LIMIT in tranforms.conf

jlvix1
Communicator

I have a fairly hefty chunk of JSON from RabbitMQ REST.

In my props I have:

[json_no_timestamp]
TRUNCATE = 500000

In transforms, I have:

[CFBPFCCmessages]
REGEX = (?U)()"messages":(?P<CFBPFCCmessages>\d+)
WRITE_META = true
FORMAT = CFBPFCCmessages::$2

[CFBPFfailed]
REGEX = (?U)()"messages":.+"messages":(?P<CFBPFfailed>\d+),"messages
WRITE_META = true
FORMAT = CFBPFfailed::$2

[CFBPFmobile]
REGEX = (?U)()"messages":.+"messages":.+"messages":(?P<CFBPFmobile>\d+),"messages
WRITE_META = true
FORMAT = CFBPFmobile::$2

[CFBPFonboard]
REGEX = (?U)()"messages":.+"messages":.+"messages":.+"messages":(?P<CFBPFonboard>\d+),"messages
WRITE_META = true
FORMAT = CFBPFonboard::$2

[CFBPFticketoffice]
REGEX = (?U)()"messages":.+"messages":.+"messages":.+"messages":.+"messages":(?P<CFBPFticketoffice>\d+),"messages
WRITE_META = true
FORMAT = CFBPFticketoffice::$2

[CFBPFtvm]
REGEX = (?U)()"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":(?P<CFBPFtvm>\d+),"messages
WRITE_META = true
FORMAT = CFBPFtvm::$2

[CFBPFunknown]
REGEX = (?U)()"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":(?P<CFBPFunknown>\d+),"messages
WRITE_META = true
FORMAT = CFBPFunknown::$2

[CFBPFweb]
REGEX = (?U)()"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":(?P<CFBPFweb>\d+),"messages
WRITE_META = true
FORMAT = CFBPFweb::$2

[CFBPMemail]
REGEX = (?U)()"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":(?P<CFBPMemail>\d+),"messages
WRITE_META = true
FORMAT = CFBPMemail::$2

[CFBPMfailed]
REGEX = (?U)()"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":(?P<CFBPMfailed>\d+),"messages
WRITE_META = true
FORMAT = CFBPMfailed::$2

[CFBPMsms]
REGEX = (?U)()"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":(?P<CFBPMsms>\d+),"messages
WRITE_META = true
FORMAT = CFBPMsms::$2

[CFBPMunknown]
REGEX = (?U)()"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":(?P<CFBPMunknown>\d+),"messages
WRITE_META = true
FORMAT = CFBPMunknown::$2

[CFGPFCCmessages]
REGEX = (?U)()"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":(?P<CFGPFCCmessages>\d+)
WRITE_META = true
FORMAT = CFGPFCCmessages::$2

[CFGPFfailed]
REGEX = (?U)()"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":(?P<CFGPFfailed>\d+),"messages
WRITE_META = true
FORMAT = CFGPFfailed::$2

[CFGPFmobile]
REGEX = (?U)()"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":(?P<CFGPFmobile>\d+),"messages
WRITE_META = true
FORMAT = CFGPFmobile::$2

[CFGPFonboard]
REGEX = (?U)()"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":(?P<CFGPFonboard>\d+),"messages
WRITE_META = true
FORMAT = CFGPFonboard::$2

[CFGPFticketoffice]
REGEX = (?U)()"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":(?P<CFGPFticketoffice>\d+),"messages
WRITE_META = true
FORMAT = CFGPFticketoffice::$2

[CFGPFtvm]
REGEX = (?U)()"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":(?P<CFGPFtvm>\d+),"messages
WRITE_META = true
FORMAT = CFGPFtvm::$2

[CFGPFunknown]
REGEX = (?U)()"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":(?P<CFGPFunknown>\d+),"messages
WRITE_META = true
FORMAT = CFGPFunknown::$2

[CFGPFweb]
REGEX = (?U)()"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":(?P<CFGPFweb>\d+),"messages
WRITE_META = true
FORMAT = CFGPFweb::$2

[CFGPMemail]
REGEX = (?U)()"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":(?P<CFGPMemail>\d+),"messages
WRITE_META = true
FORMAT = CFGPMemail::$2

[CFGPMfailed]
REGEX = (?U)()"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":(?P<CFGPMfailed>\d+),"messages
WRITE_META = true
FORMAT = CFGPMfailed::$2

[CFGPMsms]
REGEX = (?U)()"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":(?P<CFGPMsms>\d+),"messages
WRITE_META = true
FORMAT = CFGPMsms::$2

[CFGPMunknown]
REGEX = (?U)()"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":(?P<CFGPMunknown>\d+),"messages
WRITE_META = true
FORMAT = CFGPMunknown::$2

When indexing, I only get the first 3 fields, the other fields beyond CFBPFmobile are not indexed.

I was considering MATCH_LIMIT, will this work?

0 Karma
1 Solution

jlvix1
Communicator

Hi all, solved this with a major deep dive.

The RegEx provided is a great improvement, thanks for that, it still only works with the ungreedy prefix as well.

The missing point was LOOKAHEAD - default is 4KB and this is the directive to regex to not go beyond that value by default.

Now, each stanza looks like this:
[CFGPFweb]
REGEX = (?U)"messages":(.+"messages":){19}(?P\d+),"messages
LOOKAHEAD = 65535
WRITE_META = true
FORMAT = CFGPFweb::$2

And it works, I had to update every definition to 64KB, not sure how much overhead but I'm only getting 1 JSON msg per/min.

View solution in original post

jlvix1
Communicator

Hi all, solved this with a major deep dive.

The RegEx provided is a great improvement, thanks for that, it still only works with the ungreedy prefix as well.

The missing point was LOOKAHEAD - default is 4KB and this is the directive to regex to not go beyond that value by default.

Now, each stanza looks like this:
[CFGPFweb]
REGEX = (?U)"messages":(.+"messages":){19}(?P\d+),"messages
LOOKAHEAD = 65535
WRITE_META = true
FORMAT = CFGPFweb::$2

And it works, I had to update every definition to 64KB, not sure how much overhead but I'm only getting 1 JSON msg per/min.

nickhills
Ultra Champion

You might want to try this to make your regex a bit cleaner:

 [CFBPFCCmessages]
 REGEX = (?U)()"messages":(?P<CFBPFCCmessages>\d+)
 WRITE_META = true
 FORMAT = CFBPFCCmessages::$2

 [CFBPFfailed]
 REGEX = (?U)()"messages":(.+"messages":){1}(?P<CFBPFfailed>\d+),"messages
 WRITE_META = true
 FORMAT = CFBPFfailed::$2

 [CFBPFmobile]
 REGEX = (?U)()"messages":(.+"messages":){2}(?P<CFBPFmobile>\d+),"messages
 WRITE_META = true
 FORMAT = CFBPFmobile::$2
 ...
 ...

Im not familiar with RabbitMQ, but its possible that because you are not explicitly specifying a string start with ^ you could be getting inconsistent matches.

What is in your event before the first "messages" entry?

If my comment helps, please give it a thumbs up!

jlvix1
Communicator

The first instance of a match failure is reportedly at bytes 4959-4960, this is for the CFBPFonboard field, and the rest after that fail as well.

The performance stats for regex101 say that this is 39232 steps and takes ~73 ms.

Is this operation too expensive for the regex engine?

Thanks

0 Karma

jlvix1
Communicator

Hi, I have tried this and got exactly the same result, I believe this may have something to do with truncation of the event or some sort of limitation with the regex input buffer - although I have set truncate = 500000, this may not be respected from a regex point of view?

0 Karma

jlvix1
Communicator

Before and up to the first occurrence:

[{"memory":21904,"reductions":413518,"reductions_details":{"rate":0.0},"messages":0,"messages_details":

0 Karma

jlvix1
Communicator

It all works on regex101.com using PCRE, but only when I specify the ungreedy option, hence the (?U).

I will try what you have done, however when I did use the {n} regex function on regex101 is just went mad and started selecting 1, 2, 3 characters then nothing, as if it was selecting the amount in characters and not the occurrence.

I can't post the JSON here it's too much, it is very uniform and strongly formatted with no line breaks etc...

0 Karma
Get Updates on the Splunk Community!

Webinar Recap | Revolutionizing IT Operations: The Transformative Power of AI and ML ...

The Transformative Power of AI and ML in Enhancing Observability   In the realm of IT operations, the ...

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

ICYMI - Check out the latest releases of Splunk Edge Processor

Splunk is pleased to announce the latest enhancements to Splunk Edge Processor.  HEC Receiver authorization ...