All Apps and Add-ons

Difference between WHERE and SEARCH commands

tsunamii
Path Finder

What are the differences between “where” and “search”? I read somewhere that "search" tends to cause more overhead. The search below if run over one day of netflow data, it takes more than 24+ hours to run.


index=proxy* s_op=GET | lookup geoip clientip as d_ip | where client_country="Russian Federation" OR client_country="Ukraine" OR client_country="Romania" OR client_country="Bulgaria" OR client_country="Latvia" OR client_country="Azerbaijan" OR client_country="Kazakstan" OR client_country="Macedonia" OR client_country="Serbia" | table _time c_ip d_ip r_host client_country client_city cs_bytes d_port cs_uri referer c_agent

1 Solution

martin_mueller
SplunkTrust
SplunkTrust

I'm going to guess that search takes that long because it's reading a boatload of events off disk and performing the lookup, only to then possibly throw out most of them. The where (or search) after that isn't going to add a lot more to the runtime of that pipeline.
What kind of lookup is that, scripted? How many events are you loading? What are you actually looking for as a result, could you possible pre-aggregate data before looking up the location? Have you considered using the Splunk 6 iplocation command to maybe speed up the lookup process?

As for the question from the title, search and where as a filter further down the pipeline mostly differ in what they can do, and how. where only evaluates boolean expressions, so to do a wildcard filter you have to explicitly call match() while search can just do field=value*. I doubt there's a significant difference in performance when doing the same stuff compared to the actual loading of events at the start of the pipeline.

View solution in original post

martin_mueller
SplunkTrust
SplunkTrust

I'm going to guess that search takes that long because it's reading a boatload of events off disk and performing the lookup, only to then possibly throw out most of them. The where (or search) after that isn't going to add a lot more to the runtime of that pipeline.
What kind of lookup is that, scripted? How many events are you loading? What are you actually looking for as a result, could you possible pre-aggregate data before looking up the location? Have you considered using the Splunk 6 iplocation command to maybe speed up the lookup process?

As for the question from the title, search and where as a filter further down the pipeline mostly differ in what they can do, and how. where only evaluates boolean expressions, so to do a wildcard filter you have to explicitly call match() while search can just do field=value*. I doubt there's a significant difference in performance when doing the same stuff compared to the actual loading of events at the start of the pipeline.

View solution in original post

.conf21 Now Fully Virtual!
Register for FREE Today!

We've made .conf21 totally virtual and totally FREE! Our completely online experience will run from 10/19 through 10/20 with some additional events, too!