Splunk Search

Meaning of .*?

xvxt006
Contributor

Hi,

I am extracting a field and when i have .*? i am getting right value. But when i have .* it is giving unnecessary data. Can you tell me the difference between these 2?

\?search=(?P<Keyword>.*?)& - Works fine
\?search=(?P<Keyword>.*)&  - Gives unnecessary data. 

. is a single character right and when have .* means single character and 0 or more repetitions
if that above statement is true then what does it mean when we have .*?

Tags (1)
0 Karma
1 Solution

roychen
Path Finder

The "?" character in the context of a regex means "not greedy", which essentially means to match as little as possible.

It looks like you're attempting to extract the search attribute from a URL querystring.

Assuming you have a URL that looks like:

http://www.blah.com/search=blah&this=that&.

When you use the first regex with "?", the regex will match till "blah" and stop because that's the minimum it needs to match to fulfill the regex.

However, when you use the second regex without "?", the regex will attempt to match as much as possible (because it's "greedy"), and that's why you get additional data in your extracted field.

For a good tutorial on regex, try http://www.regular-expressions.info/

Hope this helps.

View solution in original post

xvxt006
Contributor

Hi this is for 'AlanFinlay'. I got an email from u but i did not see your comment here. I tried grouping (.*) it but still not getting the output as you mentioned.

This is from Alan
? means to match the preceding pattern 0 or 1 times.
So in your example the ? should make no difference since (.) will also match no characters. Maybe there is some strange parsing happening. Can you try extra parentheses to force parsing . as a group, and see if that makes a difference:
as in ?search=(?P(.*)?)&
Does that also work?
provide examples of the text you are matching and the unnecessary data?

0 Karma

xvxt006
Contributor

Hi,

when i have .*

1)capacitor&op=search&Ntt=capacitor&N=0&GlobalSearch=true&s
2)leson+motors&op=search&Ntt=leson+motors&N=0&GlobalSearch=
3)3pxw4&op=search&Ntt=3pxw4&N=0&GlobalSearch=true&sst=subse

When i have .*? i get below output which is what i want.
1)capacitor
2)leson+motors
3)3pxw4

0 Karma

roychen
Path Finder

The "?" character in the context of a regex means "not greedy", which essentially means to match as little as possible.

It looks like you're attempting to extract the search attribute from a URL querystring.

Assuming you have a URL that looks like:

http://www.blah.com/search=blah&this=that&.

When you use the first regex with "?", the regex will match till "blah" and stop because that's the minimum it needs to match to fulfill the regex.

However, when you use the second regex without "?", the regex will attempt to match as much as possible (because it's "greedy"), and that's why you get additional data in your extracted field.

For a good tutorial on regex, try http://www.regular-expressions.info/

Hope this helps.

xvxt006
Contributor

Thanks. This is helpful.

0 Karma

rturk
Builder

Hi xvxt006, can you paste two or three sample events in your question where you're seeing the issue?

0 Karma
Career Survey
First 500 qualified respondents will receive a $20 gift card! Tell us about your professional Splunk journey.

Can’t make it to .conf25? Join us online!

Get Updates on the Splunk Community!

Can’t Make It to Boston? Stream .conf25 and Learn with Haya Husain

Boston may be buzzing this September with Splunk University and .conf25, but you don’t have to pack a bag to ...

Splunk Lantern’s Guide to The Most Popular .conf25 Sessions

Splunk Lantern is a Splunk customer success center that provides advice from Splunk experts on valuable data ...

Unlock What’s Next: The Splunk Cloud Platform at .conf25

In just a few days, Boston will be buzzing as the Splunk team and thousands of community members come together ...