Solved: Where field count is less than, but not stats coun...

LatchJohnson · ‎03-19-2024

I run a Splunk query to see events from my web application firewall. I filter out certain violations by name, using a NOT and parenthesis to list out violations i don't care to see.

My network is subject to attack and my query, which i use to look for legitimate users being blocked, will be inundated by various IPs generating 100s of events. How can i table fields so i can see the data i want per event, but also filter out a field if that fields event count is greater than a value?

Simple example is an IP is seen from a facility once for a block in the last 15 minutes. Another IP, was seen 400 times as part of a scan. I want to see the 1 (or even 10) events by a specific source IP, but not the 400 from another.

I know i can block all of the IP, or part by a wildcard, but that gets messy and can lead to too many IPs in a NOT statement.

Current table info to my query

table _time, event_id, hostname, violation, policy, uri, ip_client | sort - _time

Adding a stats count by ip_client only shows the count and ip, losing the other data and the event IDs will always be different, so the count will never be higher than 1.

It would be nice if i could do something like "| where count ip_client<=10" to remove any source IPs that show up more than 10 times in the results.

PickleRick · ‎03-19-2024

You probably want to use the eventstats command.

For example (from my home lab). Let's search for events from my private web server

index=httpd earliest=-1d

Now add to each event a count of _all_ events for particular client

| eventstats count by client

Now we only want to see those events where the number of requests for the particular client was bigger than 5 (meaning a client requested a file from my web server 6 or more times)

| where count>5

View solution in original post

LatchJohnson · ‎03-20-2024

Thank you both. Eventstats worked perfectly and removed the process of adding IPs to a NOT list.

PickleRick · ‎03-20-2024

It's worth noting, however, as @bowesmana pointed out, that eventstats is a relatively "heavy" command because it needs to generate whole result set and gather it on a search-head in order to create the statistics which it later adds to results.

With a small data set you can get away with just calling eventstats and processing the results further. If your initial result set is big you might indeed want to limit set of processed fields (including removing _raw if it's no longer needed).

LatchJohnson · ‎03-20-2024

Yes,I noticed that as well. I see the event count before the eventstats removes the fields that are over my 'where count' statement limit. I'm searching back 15 minutes and only have a few hundred events based on my geolocation and other criteria before the eventstats. But a few hundred is too many for a single person to weed through, looking for legit user activity when there are a few hundred non-legit user events. Thanks for the information.

bowesmana · ‎03-19-2024

eventstats is a way to get stats without losing fields you want to retain, but it is not an efficient command.

If you do use eventstats, make sure you use the fields statement before eventstats, as all the data is transferred to the search head before the stats are calculated - you will reduce the data transfer from the indexers.

Another efficient way to get stats without losing fields is to do

| fields a b c etc... client_ip
| stats count values(*) as * by client_ip
| where count<10

This will do the aggregations but will retain all the values of the other fields in the returned row for the client ip.

This may not be how you want to see the data, but from a performance point of view, if you have large datasets, then eventstats can be very slow, whereas stats will be fast.

You can refine this further by doing something like

| fields a b c d e f client_ip
| stats count values(*) as * by client_ip a b c
| eventstats sum(count) as total by client_ip
| where total<10

where your split will collect some other fields as well as ip and then you can use eventstats on the much smaller dataset to calculate total count for the IP - this will generally be faster than eventstats at the start.

Hope this helps

PickleRick · ‎03-19-2024

You probably want to use the eventstats command.

For example (from my home lab). Let's search for events from my private web server

index=httpd earliest=-1d

Now add to each event a count of _all_ events for particular client

| eventstats count by client

Now we only want to see those events where the number of requests for the particular client was bigger than 5 (meaning a client requested a file from my web server 6 or more times)

| where count>5

Where field count is less than, but not stats count by

other

Introducing the Splunk Community Dashboard Challenge!

Get the T-shirt to Prove You Survived Splunk University Bootcamp

Wondering How to Build Resiliency in the Cloud?