Solved: Re: How do I optimize my regex for a field extract...

HattrickNZ · ‎07-28-2015

I am working on field extraction in splunk and I have come up with the below regex

(spunk regex does not work the same here)

^[^'\n]*'(?P<field1>\d+)

which pulls this value out:
79037030601

of the following events:

beginTime="2015-07-29T09:00:00+12:00",elementType="MSCServer",userLabel="MSCKPR",measInfoId=83888334,duration="PT3600S",endTime="2015-07-29T10:00:00+12:00",measObjLdn="MSCKPR/ALL HLR:MSCKPR/HLR Number:HLR Number = K'79037030601",c84162779=1,c84162780=1

Now what I am looking at doing is optimizing this regex for time efficiency and searchability in the events.
I am trying to use here to help me optimize it. One example i am working on here is this

How can i work on this regex and then be able to apply it to splunk? I don't think they are the same or are they?

MuS · ‎07-28-2015

Hi HattrickNZ,

using https://regex101.com/ and your provided example I came up with this easy regex:

'(?P<field1>\d+)"

Does this work for all events?

UPDATE: to use it in Splunk use this .. | rex "'(?P\d+)\"" | ...

cheers, MuS

View solution in original post

jeffland · ‎07-29-2015

To see the efficiency of your regexes more detailed than the indication of steps displayed above the regex, you can also use the debug mode of regex101.com to the left to see where you might run into unnecessary steps (and to learn how regexes work in general).

MuS · ‎07-28-2015

Hi HattrickNZ,

using https://regex101.com/ and your provided example I came up with this easy regex:

'(?P<field1>\d+)"

Does this work for all events?

UPDATE: to use it in Splunk use this .. | rex "'(?P\d+)\"" | ...

cheers, MuS

HattrickNZ · ‎07-28-2015

your one works on regex101 on one event but if I add more events it does not seem to work?
but my one seems to work in splunk for all events.

MuS · ‎07-28-2015

can you provide the others as well?

MuS · ‎07-28-2015

and did you use the /g flag to match global in regex101

HattrickNZ · ‎07-28-2015

tks
the global /g flag got it and it works on all events

but what is the difference in this

^[^'\n]*'(?P<field1>\d+)

and this:

'(?P<feild1>\d+)"

looks like the only difference is ^[^'\n]* these characters are missing from the start.

Also this does not work in splunk(get Unbalanced quotes. Error):

... | rex '(?P\d+)" | stats count(feild1) by feild

But this does:

... | rex "^[^'\n]*'(?P\d+)" | stats count(feild2) by feild2

Sorry for all the Qs just trying to understand this better.

HattrickNZ · ‎07-28-2015

For my reference:

'(?P<field1>\d+)"

' - finds the first '
\d+ - \d finds the first digit after '(single quote) + finds all digits that follow and stops before the "(double quote)
() - this has something to do with what to capture
?P - not sure but think it picks the first character for selection OR matches the character P literally (case sensitive) OR might have something to do with storing it in the field name name1

for example
'\d+ - will highlight '79037030601
'(?P)\d+ - will hightlight '79037030601 but it looks like it the cursor is just before the first 7 -- not sure if the ?P is required
'(\d+) - will hightlight '79037030601 and highlights the numbers 79037030601 in blue and ' in green -- soo not sure if the ?P is required

MuS · ‎07-28-2015

the (? ) is for a named matching group and you can use the P with in or not, both will work. As well in regex101.com you will get the explanation of your regex on the top right side

MuS · ‎07-28-2015

it should be like this in Splunk:

... | rex "'(?P<field>\d+)\"" | stats count(field1) by field

and to explain it; it will match a ' single quote and creates a matching group of all digits until the next " double quote. Where as your original regex was like:

^ assert position at start of the string
[^'\n]* match a single character not present in the list below
Quantifier: * Between zero and unlimited times, as many times as possible, giving back as needed [greedy]
' the literal character '
\n matches a line-feed (newline) character (ASCII 10)
' matches the character ' literally
(?P<field1>\d+) Named capturing group field1
\d+ match a digit [0-9]
Quantifier: + Between one and unlimited times, as many times as possible, giving back as needed [greedy]

How do I optimize my regex for a field extraction to improve efficiency and searchability?

Index This | Why did the turkey cross the road?

Enter the Agentic Era with Splunk AI Assistant for SPL 1.4

Feel the Splunk Love: Real Stories from Real Customers

Are you a member of the Splunk Community?

How do I optimize my regex for a field extraction to improve efficiency and searchability?

Index This | Why did the turkey cross the road?

Enter the Agentic Era with Splunk AI Assistant for SPL 1.4

Feel the Splunk Love: Real Stories from Real Customers