All Apps and Add-ons

Difference between WHERE and SEARCH commands

tsunamii
Path Finder

What are the differences between “where” and “search”? I read somewhere that "search" tends to cause more overhead. The search below if run over one day of netflow data, it takes more than 24+ hours to run.


index=proxy* s_op=GET | lookup geoip clientip as d_ip | where client_country="Russian Federation" OR client_country="Ukraine" OR client_country="Romania" OR client_country="Bulgaria" OR client_country="Latvia" OR client_country="Azerbaijan" OR client_country="Kazakstan" OR client_country="Macedonia" OR client_country="Serbia" | table _time c_ip d_ip r_host client_country client_city cs_bytes d_port cs_uri referer c_agent

1 Solution

martin_mueller
SplunkTrust
SplunkTrust

I'm going to guess that search takes that long because it's reading a boatload of events off disk and performing the lookup, only to then possibly throw out most of them. The where (or search) after that isn't going to add a lot more to the runtime of that pipeline.
What kind of lookup is that, scripted? How many events are you loading? What are you actually looking for as a result, could you possible pre-aggregate data before looking up the location? Have you considered using the Splunk 6 iplocation command to maybe speed up the lookup process?

As for the question from the title, search and where as a filter further down the pipeline mostly differ in what they can do, and how. where only evaluates boolean expressions, so to do a wildcard filter you have to explicitly call match() while search can just do field=value*. I doubt there's a significant difference in performance when doing the same stuff compared to the actual loading of events at the start of the pipeline.

View solution in original post

martin_mueller
SplunkTrust
SplunkTrust

I'm going to guess that search takes that long because it's reading a boatload of events off disk and performing the lookup, only to then possibly throw out most of them. The where (or search) after that isn't going to add a lot more to the runtime of that pipeline.
What kind of lookup is that, scripted? How many events are you loading? What are you actually looking for as a result, could you possible pre-aggregate data before looking up the location? Have you considered using the Splunk 6 iplocation command to maybe speed up the lookup process?

As for the question from the title, search and where as a filter further down the pipeline mostly differ in what they can do, and how. where only evaluates boolean expressions, so to do a wildcard filter you have to explicitly call match() while search can just do field=value*. I doubt there's a significant difference in performance when doing the same stuff compared to the actual loading of events at the start of the pipeline.

Get Updates on the Splunk Community!

Webinar Recap | Revolutionizing IT Operations: The Transformative Power of AI and ML ...

The Transformative Power of AI and ML in Enhancing Observability   In the realm of IT operations, the ...

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

ICYMI - Check out the latest releases of Splunk Edge Processor

Splunk is pleased to announce the latest enhancements to Splunk Edge Processor.  HEC Receiver authorization ...