Splunk Search

Meaning of .*?

xvxt006
Contributor

Hi,

I am extracting a field and when i have .*? i am getting right value. But when i have .* it is giving unnecessary data. Can you tell me the difference between these 2?

\?search=(?P<Keyword>.*?)& - Works fine
\?search=(?P<Keyword>.*)&  - Gives unnecessary data. 

. is a single character right and when have .* means single character and 0 or more repetitions
if that above statement is true then what does it mean when we have .*?

Tags (1)
0 Karma
1 Solution

roychen
Path Finder

The "?" character in the context of a regex means "not greedy", which essentially means to match as little as possible.

It looks like you're attempting to extract the search attribute from a URL querystring.

Assuming you have a URL that looks like:

http://www.blah.com/search=blah&this=that&.

When you use the first regex with "?", the regex will match till "blah" and stop because that's the minimum it needs to match to fulfill the regex.

However, when you use the second regex without "?", the regex will attempt to match as much as possible (because it's "greedy"), and that's why you get additional data in your extracted field.

For a good tutorial on regex, try http://www.regular-expressions.info/

Hope this helps.

View solution in original post

xvxt006
Contributor

Hi this is for 'AlanFinlay'. I got an email from u but i did not see your comment here. I tried grouping (.*) it but still not getting the output as you mentioned.

This is from Alan
? means to match the preceding pattern 0 or 1 times.
So in your example the ? should make no difference since (.) will also match no characters. Maybe there is some strange parsing happening. Can you try extra parentheses to force parsing . as a group, and see if that makes a difference:
as in ?search=(?P(.*)?)&
Does that also work?
provide examples of the text you are matching and the unnecessary data?

0 Karma

xvxt006
Contributor

Hi,

when i have .*

1)capacitor&op=search&Ntt=capacitor&N=0&GlobalSearch=true&s
2)leson+motors&op=search&Ntt=leson+motors&N=0&GlobalSearch=
3)3pxw4&op=search&Ntt=3pxw4&N=0&GlobalSearch=true&sst=subse

When i have .*? i get below output which is what i want.
1)capacitor
2)leson+motors
3)3pxw4

0 Karma

roychen
Path Finder

The "?" character in the context of a regex means "not greedy", which essentially means to match as little as possible.

It looks like you're attempting to extract the search attribute from a URL querystring.

Assuming you have a URL that looks like:

http://www.blah.com/search=blah&this=that&.

When you use the first regex with "?", the regex will match till "blah" and stop because that's the minimum it needs to match to fulfill the regex.

However, when you use the second regex without "?", the regex will attempt to match as much as possible (because it's "greedy"), and that's why you get additional data in your extracted field.

For a good tutorial on regex, try http://www.regular-expressions.info/

Hope this helps.

xvxt006
Contributor

Thanks. This is helpful.

0 Karma

rturk
Builder

Hi xvxt006, can you paste two or three sample events in your question where you're seeing the issue?

0 Karma
Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

[Puzzles] Solve, Learn, Repeat: Character substitutions with Regular Expressions

This challenge was first posted on Slack #puzzles channelFor BORE at .conf23, we had a puzzle question which ...

Splunk Community Badges!

  Hey everyone! Ready to earn some serious bragging rights in the community? Along with our existing badges ...

[Puzzles] Solve, Learn, Repeat: Matching cron expressions

This puzzle (first published here) is based on matching timestamps to cron expressions.All the timestamps ...