All Apps and Add-ons

Help with regex

lemikg
Communicator

Hi,

I extracted a field with Splunk Field Extractor which seemed to work until I noticed it didn't capture all messages (i.e. CSRF Attack Detected - Missing CSRF Token) from ModSecurity.

Here some Log msg:

--f7d234hc-H--
Message: Warning. Match of "eq 1" against "&ARGS:CSRF_TOKEN" required. [file "/cut/modsecurity_crs_43_csrf_protection.conf"] [line "31"] [id "981143"] [msg "CSRF Attack Detected - Missing CSRF Token."]
Message: Failed to write to DBM file "/tmp/global": Invalid argument
Apache-Handler: perl-script
--f7d3t15d-Z--

This is what the app gave me

(?s)--[0-9a-f]+-H--\n.*\[msg \"(?P<msg>[\w\s\/.]+)\"\]

Is there something wrong with it? Can it be done more efficiently?

Thanks in advance.

Cheers
Mike

Tags (3)
0 Karma
1 Solution

dmr195
Communicator

I think it's because there's a hyphen missing inside the innermost square brackets. Try:

(?s)--[0-9a-f]+-H--\n.*\[msg \"(?P<msg>[\w\s\/.-]+)\"\]

instead. (In case it's hard to see, the difference is 8 characters from the end.)

Your previous regex was only looking for letters, numbers, underscores, whitespace, slashes and dots between the double quotes. Hence it didn't match because "CSRF Attack Detected - Missing CSRF Token" has a hyphen in the middle.

View solution in original post

bjoernjensen
Contributor

Hi,

here are more things to be considered:

(a) it seams that the message does not start with a hex-coded ID in hyphens and that "H"
(b) you aren't getting the whole message text if it contains a hyphen

Something like this should work:
(?s)--[0-9a-z]+-[A-Z]--\n.*\[msg \"(?P<msg>[-\w\s\/.]+)\"\]

dmr195
Communicator

I feel a little guilty that my answer was accepted here, as I missed the first required change. The regex in this answer is the one to use.

0 Karma

lemikg
Communicator

thanks to you, too. I tried that as well and worked. have a great one.
cheers
Mike

0 Karma

dmr195
Communicator

I think it's because there's a hyphen missing inside the innermost square brackets. Try:

(?s)--[0-9a-f]+-H--\n.*\[msg \"(?P<msg>[\w\s\/.-]+)\"\]

instead. (In case it's hard to see, the difference is 8 characters from the end.)

Your previous regex was only looking for letters, numbers, underscores, whitespace, slashes and dots between the double quotes. Hence it didn't match because "CSRF Attack Detected - Missing CSRF Token" has a hyphen in the middle.

lemikg
Communicator

It seems, that did the trick. Thank you very much.

0 Karma
Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

Index This | What travels the world but is also stuck in place?

April 2026 Edition  Hayyy Splunk Education Enthusiasts and the Eternally Curious!   We’re back with this ...

Discover New Use Cases: Unlock Greater Value from Your Existing Splunk Data

Realizing the full potential of your Splunk investment requires more than just understanding current usage; it ...

Continue Your Journey: Join Session 2 of the Data Management and Federation Bootcamp ...

As data volumes continue to grow and environments become more distributed, managing and optimizing data ...