What are the differences between “where” and “search”? I read somewhere that "search" tends to cause more overhead. The search below if run over one day of netflow data, it takes more than 24+ hours to run.
index=proxy* s_op=GET | lookup geoip clientip as d_ip | where client_country="Russian Federation" OR client_country="Ukraine" OR client_country="Romania" OR client_country="Bulgaria" OR client_country="Latvia" OR client_country="Azerbaijan" OR client_country="Kazakstan" OR client_country="Macedonia" OR client_country="Serbia" | table _time c_ip d_ip r_host client_country client_city cs_bytes d_port cs_uri referer c_agent
I'm going to guess that search takes that long because it's reading a boatload of events off disk and performing the lookup, only to then possibly throw out most of them. The where
(or search
) after that isn't going to add a lot more to the runtime of that pipeline.
What kind of lookup is that, scripted? How many events are you loading? What are you actually looking for as a result, could you possible pre-aggregate data before looking up the location? Have you considered using the Splunk 6 iplocation
command to maybe speed up the lookup process?
As for the question from the title, search
and where
as a filter further down the pipeline mostly differ in what they can do, and how. where
only evaluates boolean expressions, so to do a wildcard filter you have to explicitly call match()
while search
can just do field=value*
. I doubt there's a significant difference in performance when doing the same stuff compared to the actual loading of events at the start of the pipeline.
I'm going to guess that search takes that long because it's reading a boatload of events off disk and performing the lookup, only to then possibly throw out most of them. The where
(or search
) after that isn't going to add a lot more to the runtime of that pipeline.
What kind of lookup is that, scripted? How many events are you loading? What are you actually looking for as a result, could you possible pre-aggregate data before looking up the location? Have you considered using the Splunk 6 iplocation
command to maybe speed up the lookup process?
As for the question from the title, search
and where
as a filter further down the pipeline mostly differ in what they can do, and how. where
only evaluates boolean expressions, so to do a wildcard filter you have to explicitly call match()
while search
can just do field=value*
. I doubt there's a significant difference in performance when doing the same stuff compared to the actual loading of events at the start of the pipeline.