Splunk Search

Need a REGEX that can extract bits from all events of a similar type

DanAlexander
Communicator

Hi All,

I need a regex that can extract particular bits from proxy events equally e.g. there are different types of events with similar KVs and I am looking for a unified rex that can work for each individual event and extract the following:

| rex mode=sed "s/(?<cip>c-ip=\S+)\s.*(?<csbytes>cs-bytes=\S+)\s.*(?<cscategories>cs-categories=\S+)\s.*(?<cshost>cs-host=\S+)\s.*(?<csip>cs-ip=\S+)\s.*(?<csmethod>cs-method=\S+)\s.*(?<csuriport>cs-uri-port=\S+)\s.*(?<csurischeme>cs-uri-scheme=\S+)\s.*(?<csusername>cs-username=\S+)\s.*(?<action>s-action=\S+)\s.*(?<sip>s-ip=\S+)\s.*(?<scbytes>sc-bytes=\S+)\s.*(?<status>sc-status=\S+)\s.*(?<timetaken>time-taken=\S+)\s.*(?<url>c-url=\S+)\s.*(?<csreferer>cs-Referer=\S+)\s.*(?<rip>r-ip=\S+)\s.*(?<ssourceport>s-source-port=\S+)\s.*/\1 \2 \3 \4 \5 \6 \7 \8 \9 \10 \11 \12 \13 \14 \15 \16 \17 \18/g"

Events of Action=Allowed work to some extent but as soon as any of the fields is missing for example cs-Referer, the event does not get stipped as expected and it ignores the regex.

Any help is much appreciated.

Thank you!

Labels (3)
0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

Try this:

| rex mode=sed "s/(?<cip>c-ip=\S+).*(?<csbytes>cs-bytes=\S+).*(?<cscategories>cs-categories=\S+).*(?<cshost>cs-host=\S+).*(?<csip>cs-ip=\S+).*(?<csmethod>cs-method=\S+).*(?<csuriport>cs-uri-port=\S+).*(?<csurischeme>cs-uri-scheme=\S+).*(?<csusername>cs-username=\S+).*(?<action>s-action=\S+).*(?<sip>s-ip=\S+).*(?<scbytes>sc-bytes=\S+).*(?<status>sc-status=\S+).*(<timetaken>time-taken=\S+).*(?<url>c-url=\S+).*(?<csreferer>cs-Referer=\S+|).*(?<rip>r-ip=\S+).*(?<ssourceport>s-source-port=\S+).*/\1 \2 \3 \4 \5 \6 \7 \8 \9 \10 \11 \12 \13 \14 \15 \16 \17 \18/g"

Because you are using \S+ you don't need the \s anchors as well as the .* - apart from removing the \s, you can add | to the match strings you want to be optional (if you didn't remove the \s, your anchor could have equated to two spaces instead of one)

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @DanAlexander,

could you share some sample of your logs, it's difficoult to check your regex!

anyway, you don't need the "mode=sed" for fields extraction.

In addition, are you shure that the structure of your logs is always the same?

because if there's something different you loose all the field extraction, so it could be better to use a different rex copmand for each field extraction, so you can understand what's the regex that isn't correct.

Ciao.

Giuseppe

0 Karma

DanAlexander
Communicator

Thanks for the reply @gcusello,

The "Allow" proxy events have a similar structure. However, a small difference like the cs-Referer pair missing will render the regex useless. I cannot imagine I have to build regex for each individual event which differs slightly. 

Looking for a unified regex that would work and strip pairs as defined in the regex for each event.

Regards,

Dan

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @DanAlexander ,

regexes depend on the logs you have, and they must adapt to them or most of them.

if you share a sample of your logs I could try to analyze your regex.

Ciao.

Giuseppe 

0 Karma

isoutamo
SplunkTrust
SplunkTrust

Hi

I you are just extracting kv paired data, can you use command extrac / kv for that?

If you need to do it with rex then in your current approach all events must be on the same order. You could add ? after group to tell that this is optional if needed, but then this will be quite hard to read. Maybe better option is use several rex rows and extract always only one value per rex?

r. Ismo

0 Karma

DanAlexander
Communicator

Hi, thanks for the reply @isoutamo 

Can please be able to provide me with some tips on how to implement all you suggested based on the below, please?

SEDCMD-Cleaning_allowed_logs=s/(?<cip>c-ip=\S+)\s.*(?<csbytes>cs-bytes=\S+)\s.*(?<cscategories>cs-categories=\S+)\s.*(?<cshost>cs-host=\S+)\s.*(?<csip>cs-ip=\S+)\s.*(?<csmethod>cs-method=\S+)\s.*(?<csuriport>cs-uri-port=\S+)\s.*(?<csurischeme>cs-uri-scheme=\S+)\s.*(?<csusername>cs-username=\S+)\s.*(?<action>s-action=\S+)\s.*(?<sip>s-ip=\S+)\s.*(?<scbytes>sc-bytes=\S+)\s.*(?<status>sc-status=\S+)\s.*(?<timetaken>time-taken=\S+)\s.*(?<url>c-url=\S+)\s.*(?<csreferer>cs-Referer=\S+)\s.*(?<rip>r-ip=\S+)\s.*(?<ssourceport>s-source-port=\S+)\s.*/\1 \2 \3 \4 \5 \6 \7 \8 \9 \10 \11 \12 \13 \14 \15 \16 \17 \18/g

The ? optional might be the solution for that. The SEDCMD and its rex are designed to strip and index the selected data only and this is why this regex is more specific.

Thank you!

0 Karma

isoutamo
SplunkTrust
SplunkTrust
To help you more we definitely need to see your log sample.
0 Karma
Get Updates on the Splunk Community!

Stay Connected: Your Guide to January Tech Talks, Office Hours, and Webinars!

What are Community Office Hours? Community Office Hours is an interactive 60-minute Zoom series where ...

[Puzzles] Solve, Learn, Repeat: Reprocessing XML into Fixed-Length Events

This challenge was first posted on Slack #puzzles channelFor a previous puzzle, I needed a set of fixed-length ...

Data Management Digest – December 2025

Welcome to the December edition of Data Management Digest! As we continue our journey of data innovation, the ...