Splunk Search

how to improve eventstats efficiency

LearningGuy
Builder

Hello,

I need help improve efficiency of my search using eventstats.
The search worked just fine, but when I applied to large set of data, it took too long. 
Please suggest.  Thank you

 IP 192.168.1.7 of server-A is connected to "LoadBalancer-to-Server" network, LoadBalancer-A is connected to "LoadBalancer-to-Server" network and "Firewall-to-Loadbalancer" network.
So, server-A is behind a firewall.  (behindfirewall = "yes")

ipnamenetworkbehindfirewall
192.168.1.1LoadBalancer-ALoadbalancer-to-Serveryes
172.168.1.1LoadBalancer-AFirewall-to-Loadbalanceryes
192.168.1.7server-ALoadbalancer-to-Serveryes
192.168.1.8server-BLoadbalancer-to-Serveryes
192.168.1.9server-Cnetwork-1no
192.168.1.9server-Dnetwork-2no

 

| makeresults format=csv data="ip,name,network,
192.168.1.1,LoadBalancer-A,Loadbalancer-to-Server
172.168.1.1,LoadBalancer-A,Firewall-to-Loadbalancer
192.168.1.7,server-A,Loadbalancer-to-Server
192.168.1.8,server-B,Loadbalancer-to-Server
192.168.1.9,server-C,network-1
192.168.1.9,server-D,network-2"
| eventstats values(name) as servergroup by network
| eventstats values(network) as networkgroup by name
| eventstats values(networkgroup) as networkpath by servergroup
| eval behindfirewall = if(match(networkpath,"Firewall-to-Loadbalancer"),"yes","no")
| table ip, name,  network, servergroup, networkgroup *

 

Labels (2)
Tags (1)
0 Karma
1 Solution

bowesmana
SplunkTrust
SplunkTrust

So, my question about what you have in your real search before eventstats is significant because ALL the data you have in the search up to eventstats will travel to the search head. Using the fields statement will remove fields you don't want from the data sent to the SH.

If you have a table statement before the eventstats, then that is also a transforming command so will cause the data to go to the SH - for efficiency you want to keep as much of the search on the indexers and only go to the SH with the minimum amount of data you actually need.

Can you post the full search?

Your 3rd eventstats is splitting by servergroup, which is now a multivalue field, which 

As for creating the lookup, from your examples, I surmise that

  • if "name" is titled "LoadBalancer-XXX" then it is a load balancer 

so collect all network names for all load balancers into a lookup, e.g.

| makeresults format=csv data="ip,name,network,
192.168.1.1,LoadBalancer-A,Loadbalancer-to-Server
172.168.1.1,LoadBalancer-A,Firewall-to-Loadbalancer
172.168.1.2,LoadBalancer-B,Loadbalancer-to-Server
192.168.1.6,server-A,Loadbalancer-to-Server
192.168.1.7,server-A,Loadbalancer-to-Server
192.168.1.8,server-B,Loadbalancer-to-Server
192.168.1.9,server-C,network-1
192.168.1.9,server-D,network-2"
| search network="Firewall-to-Loadbalancer" OR name="LoadBalancer-*"
| stats values(network) as network by name
| eval behindfirewall = if(match(network,"Firewall-to-Loadbalancer"),"1","0")
| outputlookup output_format=splunk_mv_csv firewall.csv

Then do

| lookup firewall.csv network OUTPUT behindfirewall

Not sure if that will do what you want, but maybe it gives you some ideas - I don't know your data well enough to know what's what.

View solution in original post

bowesmana
SplunkTrust
SplunkTrust

What do you have in your real search before you do the eventstats as it will push all the data to the search head, including _raw, so unless you use the fields statement you will be sending all the event data to the SH.

You are also doing lots of multivalue splits, which is going to be pretty memory hungry on the SH.

Building a tree is a tricky thing in Splunk, but if your network paths are not often changing, it may be possible to create a lookup that for 'Server-A' you can lookup its network and discover the behind firewall state.

What is the depth of the tree in your case, your example is 3 tier, going from server via the LB - if it's only 3 tier, then you could perhaps build your pathways just be fetching the name="LoadBalancer" objects and using stats values() rather than eventstats to create the lookup - as at that point you don't care about the IPs.

LearningGuy
Builder

Hi @bowesmana 

What do you have in your real search before you do the eventstats as it will push all the data to the search head, including _raw, so unless you use the fields statement you will be sending all the event data to the SH.

>> Can you re-phrase your statement?  How do I improve efficiency using fields statement?
My search using real data is using  table statement without "*", but it does have a lot of fields.


You are also doing lots of multivalue splits, which is going to be pretty memory hungry on the SH.
Which part of my search is using multivalue splits?


What is the depth of the tree in your case, your example is 3 tier, going from server via the LB - if it's only 3 tier, then you could perhaps build your pathways just be fetching the name="LoadBalancer" objects and using stats values() rather than eventstats to create the lookup - as at that point you don't care about the IPs.

The depth is always 3-tier:   Server->LB->network.
Can you give an example using stats values to create a lookup?   
 I care about the IP since the one server can have multiple IPs on its interface.
For example: Server-A can have 192.162.1.7 (int1) and 192.162.1.6 (int2)

I appreciate your assistance. Thank you so much

0 Karma

bowesmana
SplunkTrust
SplunkTrust

So, my question about what you have in your real search before eventstats is significant because ALL the data you have in the search up to eventstats will travel to the search head. Using the fields statement will remove fields you don't want from the data sent to the SH.

If you have a table statement before the eventstats, then that is also a transforming command so will cause the data to go to the SH - for efficiency you want to keep as much of the search on the indexers and only go to the SH with the minimum amount of data you actually need.

Can you post the full search?

Your 3rd eventstats is splitting by servergroup, which is now a multivalue field, which 

As for creating the lookup, from your examples, I surmise that

  • if "name" is titled "LoadBalancer-XXX" then it is a load balancer 

so collect all network names for all load balancers into a lookup, e.g.

| makeresults format=csv data="ip,name,network,
192.168.1.1,LoadBalancer-A,Loadbalancer-to-Server
172.168.1.1,LoadBalancer-A,Firewall-to-Loadbalancer
172.168.1.2,LoadBalancer-B,Loadbalancer-to-Server
192.168.1.6,server-A,Loadbalancer-to-Server
192.168.1.7,server-A,Loadbalancer-to-Server
192.168.1.8,server-B,Loadbalancer-to-Server
192.168.1.9,server-C,network-1
192.168.1.9,server-D,network-2"
| search network="Firewall-to-Loadbalancer" OR name="LoadBalancer-*"
| stats values(network) as network by name
| eval behindfirewall = if(match(network,"Firewall-to-Loadbalancer"),"1","0")
| outputlookup output_format=splunk_mv_csv firewall.csv

Then do

| lookup firewall.csv network OUTPUT behindfirewall

Not sure if that will do what you want, but maybe it gives you some ideas - I don't know your data well enough to know what's what.

LearningGuy
Builder

Hi @bowesmana 

I ran your example. When the lookup used in the search, "behindfirewall" field contains both 1 and 0.
So, I can use if condition:   if behindfirewall contains 1, then the hostname is behind the firewall, correct?

Thanks for your help

firewall.csv

LearningGuy_1-1718366967880.png

after the lookup

LearningGuy_0-1718366915227.png

 

0 Karma

bowesmana
SplunkTrust
SplunkTrust

Yes, you can just say 

| eval behindfirewall=max(behindfirewall)

however, I am not sure if that will totally work, because if something in my example is attached to LoadBalancer-B, then it will assume it's behind the firewall, so not totally sure if my suggestion is valid

Get Updates on the Splunk Community!

Get More Out of Your Security Practice With a SIEM

Get More Out of Your Security Practice With a SIEMWednesday, July 31, 2024  |  11AM PT / 2PM ETREGISTER ...

New This Month - SLO Capabilities, APM Advanced Filtering & Usage Analytics Plus ...

More for SLO Management We’re continuing to expand the built-in SLO management experience in Splunk ...

Enterprise Security Content Update (ESCU) | New Releases

In June, the Splunk Threat Research Team had 2 releases of new security content via the Enterprise Security ...