Hey There,
I have seen the Splunk. com answers and the rex cheat sheets online. However, I cant seem to get rex command to work to extract what I need from the data. I only need the XX_LMP_123456789_123 without the .pdf. Can someone guide me on how to achieve this?
just need XX_LMP_123456789_123
failureMsg="Failure to populate pdf file for XX_LMP_123456789_123.pdf in LOB_1234567_9_4567890_delivery_.pdf"
This should work:
| rex field=failureMsg "for (?<basename>.*?)\\.pdf"
or this:
| rex field=failureMsg "for (?<basename>[^.]+)"
and others, depending on how much variation exists in your source text.
This should work:
| rex field=failureMsg "for (?<basename>.*?)\\.pdf"
or this:
| rex field=failureMsg "for (?<basename>[^.]+)"
and others, depending on how much variation exists in your source text.
@tscroggins
Thanks for your help this works perfectly. would you mind explaining the rex characters
| rex field=failureMsg "for (?<basename>[^.]+)"
"for" why use for here ?- this actually works even if the failureMsg does not begin with for, but not sure why.
? stands for 0 or 1, but my failureMsg could have letters too and it works fine. Can you explain why it works?
[^.]+ What is the rest doing - I am having a hard time understanding it
Your explanation would be very helpful for me. Thanks in advance.
"for " forces a match against that exact text. Without it, your match should have included both the text before the base file name and the base file name itself. The source text doesn't need to begin with "for;" that would require additional logic in the regex.
The (?<xxx>...) sequence defines a capture group with name xxx. In Splunk, xxx becomes the extracted field name.
[^.]+ means "match one or more characters that are not a period." The match will stop when it encounters the dot in the file name.