I am working on field extraction in splunk and I have come up with the below regex
(spunk regex does not work the same here)
^[^'\n]*'(?P<field1>\d+)
which pulls this value out:
79037030601
of the following events:
beginTime="2015-07-29T09:00:00+12:00",elementType="MSCServer",userLabel="MSCKPR",measInfoId=83888334,duration="PT3600S",endTime="2015-07-29T10:00:00+12:00",measObjLdn="MSCKPR/ALL HLR:MSCKPR/HLR Number:HLR Number = K'79037030601",c84162779=1,c84162780=1
Now what I am looking at doing is optimizing this regex for time efficiency and searchability in the events.
I am trying to use here to help me optimize it. One example i am working on here is this
How can i work on this regex and then be able to apply it to splunk? I don't think they are the same or are they?
Hi HattrickNZ,
using https://regex101.com/ and your provided example I came up with this easy regex:
'(?P<field1>\d+)"
Does this work for all events?
UPDATE: to use it in Splunk use this .. | rex "'(?P\d+)\"" | ...
cheers, MuS
To see the efficiency of your regexes more detailed than the indication of steps displayed above the regex, you can also use the debug mode of regex101.com to the left to see where you might run into unnecessary steps (and to learn how regexes work in general).
Hi HattrickNZ,
using https://regex101.com/ and your provided example I came up with this easy regex:
'(?P<field1>\d+)"
Does this work for all events?
UPDATE: to use it in Splunk use this .. | rex "'(?P\d+)\"" | ...
cheers, MuS
your one works on regex101 on one event but if I add more events it does not seem to work?
but my one seems to work in splunk for all events.
can you provide the others as well?
and did you use the /g flag to match global in regex101
tks
the global /g flag got it and it works on all events
but what is the difference in this
^[^'\n]*'(?P<field1>\d+)
and this:
'(?P<feild1>\d+)"
looks like the only difference is ^[^'\n]* these characters are missing from the start.
Also this does not work in splunk(get Unbalanced quotes. Error):
... | rex '(?P\d+)" | stats count(feild1) by feild
But this does:
... | rex "^[^'\n]*'(?P\d+)" | stats count(feild2) by feild2
Sorry for all the Qs just trying to understand this better.
For my reference:
'(?P<field1>\d+)"
' - finds the first '
\d+ - \d finds the first digit after '(single quote) + finds all digits that follow and stops before the "(double quote)
() - this has something to do with what to capture
?P - not sure but think it picks the first character for selection OR matches the character P literally (case sensitive) OR might have something to do with storing it in the field name name1
for example
'\d+ - will highlight '79037030601
'(?P)\d+ - will hightlight '79037030601 but it looks like it the cursor is just before the first 7 -- not sure if the ?P is required
'(\d+) - will hightlight '79037030601 and highlights the numbers 79037030601 in blue and ' in green -- soo not sure if the ?P is required
the (? )
is for a named matching group
and you can use the P
with in or not, both will work. As well in regex101.com you will get the explanation of your regex on the top right side
it should be like this in Splunk:
... | rex "'(?P<field>\d+)\"" | stats count(field1) by field
and to explain it; it will match a '
single quote and creates a matching group of all digits until the next "
double quote. Where as your original regex was like:
^ assert position at start of the string
[^'\n]* match a single character not present in the list below
Quantifier: * Between zero and unlimited times, as many times as possible, giving back as needed [greedy]
' the literal character '
\n matches a line-feed (newline) character (ASCII 10)
' matches the character ' literally
(?P<field1>\d+) Named capturing group field1
\d+ match a digit [0-9]
Quantifier: + Between one and unlimited times, as many times as possible, giving back as needed [greedy]