Splunk Search

Optimisation: Search counting DNS Connections over 30d and 24hrs. Alert when 24hrs is 3* greater than daily "average".

Communicator

Hi All,

TL;DR: I could use some assistance with search string optimization, or help re-writing the search string to be as efficient as possible.

I hope that someone will be able to assist with an issue that I’m having.

I am attempting to write a search string that will count all outbound DNS connections over the past 30 days. After this count has been performed, I would like to divide it by 30 to obtain an “average” of daily connections.

I then wish to run a second search, which will count all outbound DNS connections for the past 24 hours, comparing to the value of “averagednsconnections_24hr” and alerting when the total DNS Connections for the past 24hrs is equal to, or greater than, 3 times the daily “average”.

I have included search logic below.

The issue that I’m having is that my search is being automatically cut off mid-way through – I assume that this could be due to the resources being used reaching a threshold. I don’t have access to check this, nor change the limit (if it is this causing the issues).

Please can someone help me optimize this search string and/or propose a better way of achieving the same results?

index=network sourcetype=network_log earliest=-30d@d latest=-1h@h (dstport=53 OR srcport=53) service="DNS" 
| stats count(dstip) AS Total_DNS_Requests_30d by srcip 
| eval Average_Daily_Requests = round(Total_DNS_Requests_30d/30,0) 
| appendcols 
    [ search index=network sourcetype=network_log earliest=-25h@h latest=-1h@h (dstport=53 OR srcport=53) service="DNS" 
    | stats count(dstip) AS Total_DNS_Requests_24hr BY srcip 
    | fields Total_DNS_Requests_24hr] 
| rename srcip AS Source_Address 
| table Source_Address Total_DNS_Requests_30d Average_Daily_Requests Total_DNS_Requests_24hr 
| where Total_DNS_Requests_24hr >= (3*Average_Daily_Requests) 
| sort -Total_DNS_Requests_24hr 
Tags (2)
0 Karma
1 Solution

SplunkTrust
SplunkTrust

You would be better off summarizing it at the day level, and then calculating the average and standard deviation based on the daily volumes. That way, highly volatile ips would get more slack than relatively consistent ones. Typically you would look for outliers being 2-3 stdevs above or below average.

You could calculate the 30-day daily figures in a separate daily search and put them into a csv file, lookup or summary index, so that you are not constantly recalculating the data.

The first time you run this it would have to go back 30 days, possibly in 1-week chunks if that makes it run better.

After that just 24 or 48 hours.

    index=network sourcetype=network_log earliest=-30d@d latest=@d (dstport=53 OR srcport=53) service="DNS" 
    | fields srcip dstip
    | bin _time span=1d
    | stats count(dstip) AS dayCount by srcip _time
    | inputcsv append=t mydays.csv  
    | dedup _time srcip 
    | outputcsv mydays.csv  

Then you would calculate the avg and stdev like this when you need it

    | inputcsv append=t mydays.csv  
    | where _time >= relative_time (now(),"-30d@d")
    | stats avg(dayCount) as avgCount, stdev(dayCount) as stdevCount by srcip

And so the whole hourly run would look like this...

 index=network sourcetype=network_log earliest=-25h@h latest=-1h@h (dstport=53 OR srcport=53) service="DNS" 
| fields srcip  dstip
| stats count(dstip) AS todayCount by srcip
| join srcip [  | inputcsv append=t mydays.csv  
    | where _time >= relative_time (now(),"-30d@d")
    | stats avg(dayCount) as avgCount, stdev(dayCount) as stdevCount by srcip
    ]
| where todayCount>= avgCount + 3*stdevCount
| sort 0 - todayCount 
| rename srcip AS Source_Address, todayCount  as Total_DNS_Requests_24hr, 
      avgCount as Average_Daily_Requests , stdevCount as StdDev_Daily_Requests  

View solution in original post

0 Karma

SplunkTrust
SplunkTrust

You would be better off summarizing it at the day level, and then calculating the average and standard deviation based on the daily volumes. That way, highly volatile ips would get more slack than relatively consistent ones. Typically you would look for outliers being 2-3 stdevs above or below average.

You could calculate the 30-day daily figures in a separate daily search and put them into a csv file, lookup or summary index, so that you are not constantly recalculating the data.

The first time you run this it would have to go back 30 days, possibly in 1-week chunks if that makes it run better.

After that just 24 or 48 hours.

    index=network sourcetype=network_log earliest=-30d@d latest=@d (dstport=53 OR srcport=53) service="DNS" 
    | fields srcip dstip
    | bin _time span=1d
    | stats count(dstip) AS dayCount by srcip _time
    | inputcsv append=t mydays.csv  
    | dedup _time srcip 
    | outputcsv mydays.csv  

Then you would calculate the avg and stdev like this when you need it

    | inputcsv append=t mydays.csv  
    | where _time >= relative_time (now(),"-30d@d")
    | stats avg(dayCount) as avgCount, stdev(dayCount) as stdevCount by srcip

And so the whole hourly run would look like this...

 index=network sourcetype=network_log earliest=-25h@h latest=-1h@h (dstport=53 OR srcport=53) service="DNS" 
| fields srcip  dstip
| stats count(dstip) AS todayCount by srcip
| join srcip [  | inputcsv append=t mydays.csv  
    | where _time >= relative_time (now(),"-30d@d")
    | stats avg(dayCount) as avgCount, stdev(dayCount) as stdevCount by srcip
    ]
| where todayCount>= avgCount + 3*stdevCount
| sort 0 - todayCount 
| rename srcip AS Source_Address, todayCount  as Total_DNS_Requests_24hr, 
      avgCount as Average_Daily_Requests , stdevCount as StdDev_Daily_Requests  

View solution in original post

0 Karma

SplunkTrust
SplunkTrust

@MikeElliott, you can consider summarizing your network log events daily so that your base query for previous 30 days returns results fast and events are way too less.

____________________________________________
| makeresults | eval message= "Happy Splunking!!!"