I have a field of the following form:
mysplit=A.B
Where A is a string of letters and B is a Number.
I'm trying to extract the .B part using the rex command, but am running into errors. Here's my search string
... | rex field=mysplit "*\.?<test>*" | ...
But I get the following error message "Error in 'rex' command: Encountered the following error while compiling the regex '.?': Regex: nothing to repeat"
Can someone see what I'm doing wrong?
Thanks!
There are some things missing here. You need to:
a) Specify a capture group - Using ()
and
b) ensure that you have something to "repeat". I presume that your asterix means that you want to say "any character". However in regex language asterix means "repeat". So you must tell Splunk what you want repeated.
I think you mean to say "Any character", but for regex you need to say .*
- The period means "any character" and the * means "repeated 0 or many times".
Try the code below. This is assuming that the second value in your delimited field is also the last value.
... | rex field=mysplit ".*?\.(?.*)" | ...
There are some things missing here. You need to:
a) Specify a capture group - Using ()
and
b) ensure that you have something to "repeat". I presume that your asterix means that you want to say "any character". However in regex language asterix means "repeat". So you must tell Splunk what you want repeated.
I think you mean to say "Any character", but for regex you need to say .*
- The period means "any character" and the * means "repeated 0 or many times".
Try the code below. This is assuming that the second value in your delimited field is also the last value.
... | rex field=mysplit ".*?\.(?.*)" | ...
Thanks for your 'not greedy' explanation. That makes sense!
With regards to your comment on my code, I forgot to put my code in the code bracket. 😞 I have fixed it. 🙂
No problem! Glad I could help!
Got'cha. When I tried your code 'as is' it didn't work. Then I realized that the '<'test> was eaten by the internet.
For those reading this in the future, the above code almost looks like this:
... | rex field=mysplit ".*?\.(?'<'test>.*)" | ...
Be sure to remove the two ' surrounding the less than symbol.
Quick question. Why do we need the '?' that is in front of the '.' I understand why we need to escape the period, but not the question mark. Thanks for your help!
Ah yes the '<'test> did get swallowed up. Thanks for updating!
So the first ? is to make the previous wildcard "not greedy". That is, if I say .*\.
- This means the following
"capture any value multiple times up until a period."
By default the rex will match everything up until the LAST period. When I add the ? like this - .*?\.
- it simply adjusts it to mean the following
"capture any value multiple times up until the FIRST period."
Note, in your code above, you seem to have stripped out the wildcard, *
so indeed, the first ? makes no sense here. But if you see my answer it has .*?\.
- The ? basically makes sure that the * stops matching once it comes to the FIRST instance of whatever you have after the ?.
The second ? is something related to the capture group and the field name, but I'm not entirely sure what it does!
Hope this helps!