Splunk Search

Meaning of .*?

xvxt006
Contributor

Hi,

I am extracting a field and when i have .*? i am getting right value. But when i have .* it is giving unnecessary data. Can you tell me the difference between these 2?

\?search=(?P<Keyword>.*?)& - Works fine
\?search=(?P<Keyword>.*)&  - Gives unnecessary data. 

. is a single character right and when have .* means single character and 0 or more repetitions
if that above statement is true then what does it mean when we have .*?

Tags (1)
0 Karma
1 Solution

roychen
Path Finder

The "?" character in the context of a regex means "not greedy", which essentially means to match as little as possible.

It looks like you're attempting to extract the search attribute from a URL querystring.

Assuming you have a URL that looks like:

http://www.blah.com/search=blah&this=that&.

When you use the first regex with "?", the regex will match till "blah" and stop because that's the minimum it needs to match to fulfill the regex.

However, when you use the second regex without "?", the regex will attempt to match as much as possible (because it's "greedy"), and that's why you get additional data in your extracted field.

For a good tutorial on regex, try http://www.regular-expressions.info/

Hope this helps.

View solution in original post

xvxt006
Contributor

Hi this is for 'AlanFinlay'. I got an email from u but i did not see your comment here. I tried grouping (.*) it but still not getting the output as you mentioned.

This is from Alan
? means to match the preceding pattern 0 or 1 times.
So in your example the ? should make no difference since (.) will also match no characters. Maybe there is some strange parsing happening. Can you try extra parentheses to force parsing . as a group, and see if that makes a difference:
as in ?search=(?P(.*)?)&
Does that also work?
provide examples of the text you are matching and the unnecessary data?

0 Karma

xvxt006
Contributor

Hi,

when i have .*

1)capacitor&op=search&Ntt=capacitor&N=0&GlobalSearch=true&s
2)leson+motors&op=search&Ntt=leson+motors&N=0&GlobalSearch=
3)3pxw4&op=search&Ntt=3pxw4&N=0&GlobalSearch=true&sst=subse

When i have .*? i get below output which is what i want.
1)capacitor
2)leson+motors
3)3pxw4

0 Karma

roychen
Path Finder

The "?" character in the context of a regex means "not greedy", which essentially means to match as little as possible.

It looks like you're attempting to extract the search attribute from a URL querystring.

Assuming you have a URL that looks like:

http://www.blah.com/search=blah&this=that&.

When you use the first regex with "?", the regex will match till "blah" and stop because that's the minimum it needs to match to fulfill the regex.

However, when you use the second regex without "?", the regex will attempt to match as much as possible (because it's "greedy"), and that's why you get additional data in your extracted field.

For a good tutorial on regex, try http://www.regular-expressions.info/

Hope this helps.

xvxt006
Contributor

Thanks. This is helpful.

0 Karma

rturk
Builder

Hi xvxt006, can you paste two or three sample events in your question where you're seeing the issue?

0 Karma
Get Updates on the Splunk Community!

Introduction to Splunk Observability Cloud - Building a Resilient Hybrid Cloud

Introduction to Splunk Observability Cloud - Building a Resilient Hybrid Cloud  In today’s fast-paced digital ...

Observability protocols to know about

Observability protocols define the specifications or formats for collecting, encoding, transporting, and ...

Take Your Breath Away with Splunk Risk-Based Alerting (RBA)

WATCH NOW!The Splunk Guide to Risk-Based Alerting is here to empower your SOC like never before. Join Haylee ...