Splunk Search

RegEx for pattern matching and extraction

mbasharat
Builder

Hi,

I have data that contains Sessions ID labeled as (SES) and User ID labeled as (ABC).

When I look at the events, I am seeing below variations. RegEx should grab anything that is 14 digits followed by 0 or more groups of dash/hyphen with 9 digits or dash/hyphen with 0 digits. I need a RegEx that extract the SES and ABC into separate fields from below variations.

Formats seen:
SES
SES-ABC
SES—ABC
SES—ABC-
SES-ABC-ABC

Sample data:
1234567-123456789---
1234567-1234567890-123456789--
12345678-123456789--A12345678-123456789
123456789
12345678900000
12345ac4-1234-1a12-9as9-1aa111as23aa
12345678900000-123456789
12345678900000-123456789-1234567890

Thanks in-advance

Tags (2)
0 Karma
1 Solution

to4kawa
Ultra Champion
| makeresults 
| eval _raw="raw
1234567-123456789---
1234567-1234567890-123456789--
12345678-123456789--A12345678-123456789
123456789
12345678900000
12345ac4-1234-1a12-9as9-1aa111as23aa
12345678900000-123456789
12345678900000-123456789-1234567890" 
| multikv forceheader=1 
| rex max_match=2 "(?<SES>^\d+)|-(?<ABC>\d+)(?:-|$)" 
| eval SES=trim(SES,"0"), ABC=trim(ABC,"0")

use rex with limits max_match

View solution in original post

0 Karma

to4kawa
Ultra Champion
| makeresults 
| eval _raw="raw
1234567-123456789---
1234567-1234567890-123456789--
12345678-123456789--A12345678-123456789
123456789
12345678900000
12345ac4-1234-1a12-9as9-1aa111as23aa
12345678900000-123456789
12345678900000-123456789-1234567890" 
| multikv forceheader=1 
| rex max_match=2 "(?<SES>^\d+)|-(?<ABC>\d+)(?:-|$)" 
| eval SES=trim(SES,"0"), ABC=trim(ABC,"0")

use rex with limits max_match

0 Karma

mbasharat
Builder

After dealing with customer, data at the source is fixed. Above RegEx works perfectly now. THANK YOU!

0 Karma

to4kawa
Ultra Champion

12345ac4-1234-1a12-9as9-1aa111as23aa
where is SES and ABC?

0 Karma

mbasharat
Builder

Hi @ t04kawa This one is a very odd pattern and I am also scratching my head when I was looking at it. Lemme try your provided solution below. Will report back shortly.

0 Karma

jpolvino
Builder

Hi, can you please provide a little more detail? Specifically in the examples you provide, what are the examples of SES and ABC matches you expect from the legal ones? And which of those should not match anything?

When you have ABC twice (the last formats seen line) is that literally the same ABC twice, or different ABCs?

mbasharat
Builder

Hi @ jpolvino,

I only need SES and ABC extracted from above patterns. In last example, ABC is twice. It is same ABC but second one has additional number or a character. I will need the 9 digit ABCs only which is the middle one in last example.

Sample data:
1234567-123456789--- (Need 9 digit ABC only, 123456789)
1234567-1234567890-123456789-- (Need 9 digit ABC only, 123456789)
12345678-123456789--A12345678-123456789 (Need 9 digit ABC only, 123456789, last one)
123456789 (Need 9 digit ABC only, 123456789)
12345678900000 (Need 9 digit ABC only, 123456789)
12345ac4-1234-1a12-9as9-1aa111as23aa (This I am trying to figure out with data owners to clarify this pattern)
12345678900000-123456789 (Need 9 digit ABC only, 123456789)
12345678900000-123456789-1234567890 (Need 9 digit ABC only, 123456789, the middle one)

0 Karma
Get Updates on the Splunk Community!

Announcing Scheduled Export GA for Dashboard Studio

We're excited to announce the general availability of Scheduled Export for Dashboard Studio. Starting in ...

Extending Observability Content to Splunk Cloud

Watch Now!   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to leverage ...

More Control Over Your Monitoring Costs with Archived Metrics GA in US-AWS!

What if there was a way you could keep all the metrics data you need while saving on storage costs?This is now ...