Hello,
I need help improve efficiency of my search using eventstats.
The search worked just fine, but when I applied to large set of data, it took too long.
Please suggest. Thank you
IP 192.168.1.7 of server-A is connected to "LoadBalancer-to-Server" network, LoadBalancer-A is connected to "LoadBalancer-to-Server" network and "Firewall-to-Loadbalancer" network.
So, server-A is behind a firewall. (behindfirewall = "yes")
ip | name | network | behindfirewall |
192.168.1.1 | LoadBalancer-A | Loadbalancer-to-Server | yes |
172.168.1.1 | LoadBalancer-A | Firewall-to-Loadbalancer | yes |
192.168.1.7 | server-A | Loadbalancer-to-Server | yes |
192.168.1.8 | server-B | Loadbalancer-to-Server | yes |
192.168.1.9 | server-C | network-1 | no |
192.168.1.9 | server-D | network-2 | no |
| makeresults format=csv data="ip,name,network,
192.168.1.1,LoadBalancer-A,Loadbalancer-to-Server
172.168.1.1,LoadBalancer-A,Firewall-to-Loadbalancer
192.168.1.7,server-A,Loadbalancer-to-Server
192.168.1.8,server-B,Loadbalancer-to-Server
192.168.1.9,server-C,network-1
192.168.1.9,server-D,network-2"
| eventstats values(name) as servergroup by network
| eventstats values(network) as networkgroup by name
| eventstats values(networkgroup) as networkpath by servergroup
| eval behindfirewall = if(match(networkpath,"Firewall-to-Loadbalancer"),"yes","no")
| table ip, name, network, servergroup, networkgroup *
So, my question about what you have in your real search before eventstats is significant because ALL the data you have in the search up to eventstats will travel to the search head. Using the fields statement will remove fields you don't want from the data sent to the SH.
If you have a table statement before the eventstats, then that is also a transforming command so will cause the data to go to the SH - for efficiency you want to keep as much of the search on the indexers and only go to the SH with the minimum amount of data you actually need.
Can you post the full search?
Your 3rd eventstats is splitting by servergroup, which is now a multivalue field, which
As for creating the lookup, from your examples, I surmise that
so collect all network names for all load balancers into a lookup, e.g.
| makeresults format=csv data="ip,name,network,
192.168.1.1,LoadBalancer-A,Loadbalancer-to-Server
172.168.1.1,LoadBalancer-A,Firewall-to-Loadbalancer
172.168.1.2,LoadBalancer-B,Loadbalancer-to-Server
192.168.1.6,server-A,Loadbalancer-to-Server
192.168.1.7,server-A,Loadbalancer-to-Server
192.168.1.8,server-B,Loadbalancer-to-Server
192.168.1.9,server-C,network-1
192.168.1.9,server-D,network-2"
| search network="Firewall-to-Loadbalancer" OR name="LoadBalancer-*"
| stats values(network) as network by name
| eval behindfirewall = if(match(network,"Firewall-to-Loadbalancer"),"1","0")
| outputlookup output_format=splunk_mv_csv firewall.csv
Then do
| lookup firewall.csv network OUTPUT behindfirewall
Not sure if that will do what you want, but maybe it gives you some ideas - I don't know your data well enough to know what's what.
What do you have in your real search before you do the eventstats as it will push all the data to the search head, including _raw, so unless you use the fields statement you will be sending all the event data to the SH.
You are also doing lots of multivalue splits, which is going to be pretty memory hungry on the SH.
Building a tree is a tricky thing in Splunk, but if your network paths are not often changing, it may be possible to create a lookup that for 'Server-A' you can lookup its network and discover the behind firewall state.
What is the depth of the tree in your case, your example is 3 tier, going from server via the LB - if it's only 3 tier, then you could perhaps build your pathways just be fetching the name="LoadBalancer" objects and using stats values() rather than eventstats to create the lookup - as at that point you don't care about the IPs.
Hi @bowesmana
What do you have in your real search before you do the eventstats as it will push all the data to the search head, including _raw, so unless you use the fields statement you will be sending all the event data to the SH.
>> Can you re-phrase your statement? How do I improve efficiency using fields statement?
My search using real data is using table statement without "*", but it does have a lot of fields.
You are also doing lots of multivalue splits, which is going to be pretty memory hungry on the SH.
Which part of my search is using multivalue splits?
What is the depth of the tree in your case, your example is 3 tier, going from server via the LB - if it's only 3 tier, then you could perhaps build your pathways just be fetching the name="LoadBalancer" objects and using stats values() rather than eventstats to create the lookup - as at that point you don't care about the IPs.
The depth is always 3-tier: Server->LB->network.
Can you give an example using stats values to create a lookup?
I care about the IP since the one server can have multiple IPs on its interface.
For example: Server-A can have 192.162.1.7 (int1) and 192.162.1.6 (int2)
I appreciate your assistance. Thank you so much
So, my question about what you have in your real search before eventstats is significant because ALL the data you have in the search up to eventstats will travel to the search head. Using the fields statement will remove fields you don't want from the data sent to the SH.
If you have a table statement before the eventstats, then that is also a transforming command so will cause the data to go to the SH - for efficiency you want to keep as much of the search on the indexers and only go to the SH with the minimum amount of data you actually need.
Can you post the full search?
Your 3rd eventstats is splitting by servergroup, which is now a multivalue field, which
As for creating the lookup, from your examples, I surmise that
so collect all network names for all load balancers into a lookup, e.g.
| makeresults format=csv data="ip,name,network,
192.168.1.1,LoadBalancer-A,Loadbalancer-to-Server
172.168.1.1,LoadBalancer-A,Firewall-to-Loadbalancer
172.168.1.2,LoadBalancer-B,Loadbalancer-to-Server
192.168.1.6,server-A,Loadbalancer-to-Server
192.168.1.7,server-A,Loadbalancer-to-Server
192.168.1.8,server-B,Loadbalancer-to-Server
192.168.1.9,server-C,network-1
192.168.1.9,server-D,network-2"
| search network="Firewall-to-Loadbalancer" OR name="LoadBalancer-*"
| stats values(network) as network by name
| eval behindfirewall = if(match(network,"Firewall-to-Loadbalancer"),"1","0")
| outputlookup output_format=splunk_mv_csv firewall.csv
Then do
| lookup firewall.csv network OUTPUT behindfirewall
Not sure if that will do what you want, but maybe it gives you some ideas - I don't know your data well enough to know what's what.
Hi @bowesmana
I ran your example. When the lookup used in the search, "behindfirewall" field contains both 1 and 0.
So, I can use if condition: if behindfirewall contains 1, then the hostname is behind the firewall, correct?
Thanks for your help
firewall.csv
after the lookup
Yes, you can just say
| eval behindfirewall=max(behindfirewall)
however, I am not sure if that will totally work, because if something in my example is attached to LoadBalancer-B, then it will assume it's behind the firewall, so not totally sure if my suggestion is valid