Splunk Search

How to modify my regular expression to extract strings between two pipes?

maximusdm
Communicator

hello, I need to extract the strings between both pipes " | | ", for instance, here are a few sample strings:
(sometimes we have a pipe: " I " and sometimes we have a uppercase letter " i" )

ASDSAD ASDASD ASDAS | STRING001 | ASDA ASDASD ASDASDADADA
ASDSAD ASDASD ASDAS I STRING002 I ASDA ASDASD ASDASDADADA

My regular expression works 90% of time:

| rex field="Site Section" ".*\|\s*(?<SiteSection>.*)\s*\|"   
| rex field="Site Section" ".*\I\s*(?<SiteSection>.*)\s*\I"  
| rex field="Site Section" ".*\I\s*(?<SiteSection>.*)\s*\|" 
| rex field="Site Section" ".*\|\s*(?<SiteSection>.*)\s*\I" 

However it does not work for the strings below:
ASDASD ASDASDASDA ADASDADAD I AMC I IFC <=== returns empty
(most likely because of "IFC" string contains a uppercase letter "i")

ASDASD ASDASDASDA ADASDADAD I DISCO I ADASDA <== returns "ISCO"
(most likely because of "IFC" string contains a uppercase letter "i")

Any ideas how to modify my regular expression?
Thanks

Tags (1)
0 Karma
1 Solution

somesoni2
Revered Legend

Give this a try

Updated

your base search | rex field="Site Section" "\s(\||I)\s+(?<SiteSection>.+)\s+(\||I)\s" 

View solution in original post

0 Karma

gokadroid
Motivator

If still required, can you check this one which shall work in most of the cases:

your query to return events
| rex field=_raw"\s*(\s*\|\s*(?<captureMe>[^\|]+)\|\s*)"
| table captureMe

See extraction here

0 Karma

somesoni2
Revered Legend

Give this a try

Updated

your base search | rex field="Site Section" "\s(\||I)\s+(?<SiteSection>.+)\s+(\||I)\s" 
0 Karma

maximusdm
Communicator

it is a lot better but still if I have a letter uppercase " i " after the second pipe " | " then it doesnt work properly. Thanks

0 Karma

somesoni2
Revered Legend

A sample log where it's failing?

0 Karma

maximusdm
Communicator

if you have a string such as: ABCDE I AAA I IFC the results will be "AAA I" and not "AAA" as it should be.

0 Karma

somesoni2
Revered Legend

The value/string that you want to capture, will it always be a single word or can be multiple words?
Try the updated answer as well.

0 Karma

maximusdm
Communicator

with your update I only had one string which failed and it is because there is no space between the pipe "|" and the letter "i", for instance:
AASSDDF DFGJKJ | A&E |FYI will return nothing.

PS: strings with 2 words between the pipes work just fine!

0 Karma

somesoni2
Revered Legend

How about this?

your base search | rex field="Site Section" "\s(\||I)\s+(?<SiteSection>.+)\s+(\||I\s)" 
0 Karma

maximusdm
Communicator

now it fails when there are no spaces between the first pipe LOL
for instance:
ASDF ASDF| A&E |FYI or
ASDF ASDF |A&E |FYI

0 Karma

maximusdm
Communicator

This resolved my problem by replacing the " i " with pipes before the next reg.exp.

| rex field="Site Section" mode=sed "s,\sI\s, | ,g"
| rex field="Site Section" ".|\s(?.)\s|"

I want to thank you for pointing me to the right direction.

0 Karma
Get Updates on the Splunk Community!

Leveraging Detections from the Splunk Threat Research Team & Cisco Talos

  Now On Demand  Stay ahead of today’s evolving threats with the combined power of the Splunk Threat Research ...

New in Splunk Observability Cloud: Automated Archiving for Unused Metrics

Automated Archival is a new capability within Metrics Management; which is a robust usage & cost optimization ...

Calling All Security Pros: Ready to Race Through Boston?

Hey Splunkers, .conf25 is heading to Boston and we’re kicking things off with something bold, competitive, and ...