Solved: how to improve eventstats efficiency

LearningGuy · ‎06-11-2024

Hello,

I need help improve efficiency of my search using eventstats.
The search worked just fine, but when I applied to large set of data, it took too long.
Please suggest. Thank you

IP 192.168.1.7 of server-A is connected to "LoadBalancer-to-Server" network, LoadBalancer-A is connected to "LoadBalancer-to-Server" network and "Firewall-to-Loadbalancer" network.
So, server-A is behind a firewall. (behindfirewall = "yes")

ip	name	network	behindfirewall
192.168.1.1	LoadBalancer-A	Loadbalancer-to-Server	yes
172.168.1.1	LoadBalancer-A	Firewall-to-Loadbalancer	yes
192.168.1.7	server-A	Loadbalancer-to-Server	yes
192.168.1.8	server-B	Loadbalancer-to-Server	yes
192.168.1.9	server-C	network-1	no
192.168.1.9	server-D	network-2	no

| makeresults format=csv data="ip,name,network,
192.168.1.1,LoadBalancer-A,Loadbalancer-to-Server
172.168.1.1,LoadBalancer-A,Firewall-to-Loadbalancer
192.168.1.7,server-A,Loadbalancer-to-Server
192.168.1.8,server-B,Loadbalancer-to-Server
192.168.1.9,server-C,network-1
192.168.1.9,server-D,network-2"
| eventstats values(name) as servergroup by network
| eventstats values(network) as networkgroup by name
| eventstats values(networkgroup) as networkpath by servergroup
| eval behindfirewall = if(match(networkpath,"Firewall-to-Loadbalancer"),"yes","no")
| table ip, name,  network, servergroup, networkgroup *

bowesmana · ‎06-13-2024

So, my question about what you have in your real search before eventstats is significant because ALL the data you have in the search up to eventstats will travel to the search head. Using the fields statement will remove fields you don't want from the data sent to the SH.

If you have a table statement before the eventstats, then that is also a transforming command so will cause the data to go to the SH - for efficiency you want to keep as much of the search on the indexers and only go to the SH with the minimum amount of data you actually need.

Can you post the full search?

Your 3rd eventstats is splitting by servergroup, which is now a multivalue field, which

As for creating the lookup, from your examples, I surmise that

if "name" is titled "LoadBalancer-XXX" then it is a load balancer

so collect all network names for all load balancers into a lookup, e.g.

| makeresults format=csv data="ip,name,network,
192.168.1.1,LoadBalancer-A,Loadbalancer-to-Server
172.168.1.1,LoadBalancer-A,Firewall-to-Loadbalancer
172.168.1.2,LoadBalancer-B,Loadbalancer-to-Server
192.168.1.6,server-A,Loadbalancer-to-Server
192.168.1.7,server-A,Loadbalancer-to-Server
192.168.1.8,server-B,Loadbalancer-to-Server
192.168.1.9,server-C,network-1
192.168.1.9,server-D,network-2"
| search network="Firewall-to-Loadbalancer" OR name="LoadBalancer-*"
| stats values(network) as network by name
| eval behindfirewall = if(match(network,"Firewall-to-Loadbalancer"),"1","0")
| outputlookup output_format=splunk_mv_csv firewall.csv

Then do

| lookup firewall.csv network OUTPUT behindfirewall

Not sure if that will do what you want, but maybe it gives you some ideas - I don't know your data well enough to know what's what.

View solution in original post

bowesmana · ‎06-11-2024

What do you have in your real search before you do the eventstats as it will push all the data to the search head, including _raw, so unless you use the fields statement you will be sending all the event data to the SH.

You are also doing lots of multivalue splits, which is going to be pretty memory hungry on the SH.

Building a tree is a tricky thing in Splunk, but if your network paths are not often changing, it may be possible to create a lookup that for 'Server-A' you can lookup its network and discover the behind firewall state.

What is the depth of the tree in your case, your example is 3 tier, going from server via the LB - if it's only 3 tier, then you could perhaps build your pathways just be fetching the name="LoadBalancer" objects and using stats values() rather than eventstats to create the lookup - as at that point you don't care about the IPs.

LearningGuy · ‎06-13-2024

Hi @bowesmana

What do you have in your real search before you do the eventstats as it will push all the data to the search head, including _raw, so unless you use the fields statement you will be sending all the event data to the SH.

>> Can you re-phrase your statement? How do I improve efficiency using fields statement?
My search using real data is using table statement without "*", but it does have a lot of fields.

You are also doing lots of multivalue splits, which is going to be pretty memory hungry on the SH.
Which part of my search is using multivalue splits?

What is the depth of the tree in your case, your example is 3 tier, going from server via the LB - if it's only 3 tier, then you could perhaps build your pathways just be fetching the name="LoadBalancer" objects and using stats values() rather than eventstats to create the lookup - as at that point you don't care about the IPs.

The depth is always 3-tier: Server->LB->network.
Can you give an example using stats values to create a lookup?
I care about the IP since the one server can have multiple IPs on its interface.
For example: Server-A can have 192.162.1.7 (int1) and 192.162.1.6 (int2)

I appreciate your assistance. Thank you so much

bowesmana · ‎06-13-2024

So, my question about what you have in your real search before eventstats is significant because ALL the data you have in the search up to eventstats will travel to the search head. Using the fields statement will remove fields you don't want from the data sent to the SH.

If you have a table statement before the eventstats, then that is also a transforming command so will cause the data to go to the SH - for efficiency you want to keep as much of the search on the indexers and only go to the SH with the minimum amount of data you actually need.

Can you post the full search?

Your 3rd eventstats is splitting by servergroup, which is now a multivalue field, which

As for creating the lookup, from your examples, I surmise that

if "name" is titled "LoadBalancer-XXX" then it is a load balancer

so collect all network names for all load balancers into a lookup, e.g.

| makeresults format=csv data="ip,name,network,
192.168.1.1,LoadBalancer-A,Loadbalancer-to-Server
172.168.1.1,LoadBalancer-A,Firewall-to-Loadbalancer
172.168.1.2,LoadBalancer-B,Loadbalancer-to-Server
192.168.1.6,server-A,Loadbalancer-to-Server
192.168.1.7,server-A,Loadbalancer-to-Server
192.168.1.8,server-B,Loadbalancer-to-Server
192.168.1.9,server-C,network-1
192.168.1.9,server-D,network-2"
| search network="Firewall-to-Loadbalancer" OR name="LoadBalancer-*"
| stats values(network) as network by name
| eval behindfirewall = if(match(network,"Firewall-to-Loadbalancer"),"1","0")
| outputlookup output_format=splunk_mv_csv firewall.csv

Then do

| lookup firewall.csv network OUTPUT behindfirewall

Not sure if that will do what you want, but maybe it gives you some ideas - I don't know your data well enough to know what's what.

LearningGuy · ‎06-14-2024

Hi @bowesmana

I ran your example. When the lookup used in the search, "behindfirewall" field contains both 1 and 0.
So, I can use if condition: if behindfirewall contains 1, then the hostname is behind the firewall, correct?

Thanks for your help

firewall.csv

after the lookup

bowesmana · ‎06-16-2024

Yes, you can just say

| eval behindfirewall=max(behindfirewall)

however, I am not sure if that will totally work, because if something in my example is attached to LoadBalancer-B, then it will assume it's behind the firewall, so not totally sure if my suggestion is valid

how to improve eventstats efficiency

eval

stats

Splunk at Cisco Live 2025: Learning, Innovation, and a Little Bit of Mr. Brightside

Splunk App Dev Community Updates – What’s New and What’s Next

The Latest Cisco Integrations With Splunk Platform!

Are you a member of the Splunk Community?