Splunk Search

Count events per month until a certain day

Path Finder

Hi community,

I need your help!!!

It is possible to make a report that counts the number of events grouped by month but until a certain day, that is, if the current day is 9 then the events are counted until the 9th day of each month.

Example:

_time - count
09/09/2017 - 4567
08/09/2017 - 2346
07/09/2017 - 5678
06/09/2017 - 4789
05/09/2017 - 8965
04/09/2017 - 4567
03/09/2017 - 6789
02/09/2017 - 3456
01/09/2017 - 9087

Thanks

0 Karma
1 Solution

Esteemed Legend

Funky. Try this:

| makeresults
| eval date_mday=strftime(now(), "%d")
| map [|tstats count where index=* BY _time span=1d
| eval date_mday=strftime(_time, "%d")
| where date_mday <= tonumber($date_mday$)
| timechart span=1mon sum(count) AS count
| fillnull count value=0
| eval _time = _time + 24 * ($date_mday$ - 1) * 60 * 60]

View solution in original post

SplunkTrust
SplunkTrust

Try this...

your query that gets the events you care about
| rename COMMENT as "count up the events at the day level"
| bin _time span=1d
| stats count as daycount by _time

| rename COMMENT as "Figure out the offset for which day this is"
| eventstats max(_time) as maxtime
| eval maxmonth=relative_time(maxtime,"@mon")
| eval nowoffset=maxtime-maxmonth

| rename COMMENT as "If date is 28 or greater, use flat month"
| eval revisedoffset=if(nowoffset>=28*86400,0,nowoffset)
| rename COMMENT as "subtract the offset and calculate the month for reporting purposes"
| eval reportingmonth=relative_time(_time-revisedoffset,"@mon")

| rename COMMENT as "sum up all the days, use the date of the last day as the date of the report"
| stats sum(daycount) as count max(_time) as reptdate by reportingmonth

| rename COMMENT as "present the report"
| eval _time = reptdate
| table _time count

This version will keep, for example, 4/30 with 5/29

 your query that gets the events you care about
 | rename COMMENT as "count up the events at the day level"
 | bin _time span=1d
 | stats count as daycount by _time

 | rename COMMENT as "Figure out the offset for which day this is, copy it to all records and delete the calc record"
 | appendpipe  
    [| stats max(_time) as maxtime 
     | eval maxmonth=relative_time(maxtime,"@mon") 
     | eval nowoffset=maxtime-maxmonth 
     ]
 | eventstats max(nowoffset) as nowoffset
 | where isnull(maxmonth)

 | rename COMMENT as "For this event, figure out the endpoints "
 | eval calcmonth0=relative_time(_time,"@mon")
 | eval calcmonth1=relative_time(_time,"+1mon@mon")
 | eval calcmonth2=relative_time(_time,"+2mon@mon")
 | eval endmonth0=if(calcmonth0+nowoffset>=calcmonth1,relative_time(calcmonth1,"-1d"),calcmonth0+nowoffset)
 | eval endmonth1=if(calcmonth1+nowoffset>=calcmonth2,relative_time(calcmonth2,"-1d"),calcmonth1+nowoffset)
 | eval reportingmonth=if(_time>endmonth0,endmonth1,endmonth0)


 | rename COMMENT as "sum up all the days, use the date of the last day as the date of the report"
 | stats sum(daycount) as count max(_time) as reptdate min(_time) as reptstartdate by reportingmonth

 | rename COMMENT as "present the report"
 | eval _time = reptdate
 | eval reptrange=strftime(reptstartdate,"%Y-%m-%d")." to ".strftime(reptdate,"%Y-%m-%d") 
 | table _time reptrange count

Path Finder

hi @DalJeanis

Thanks a lot for your help, your solution seems to work, however, I have a problem with my data, they are duplicated and I need to clean them up to more than 10 million, so the search takes a lot.

0 Karma

SplunkTrust
SplunkTrust

@lufermalgo - You are welcome. Under those circumstances, if this is not going to be just a one-time report, I would suggest that you consider creating a summary index to hold your daily figures.

This report will run quickly against the summary data, just like a tstats would.

If the duplicates are always created the same day, then your daily summary creation program can weed them out. If they are occasionally created later, then you can (for instance) resummarize your daily records a week later, and always pick the latest summary record for any given date. Again, this current report would run quickly against the summary index.

0 Karma

Esteemed Legend

needs more tstats...

0 Karma

SplunkTrust
SplunkTrust

@woodcock - Probably true. Everything down to the first stats command can be replaced by the equivalent tstats command if the data is there.

0 Karma

Esteemed Legend

Funky. Try this:

| makeresults
| eval date_mday=strftime(now(), "%d")
| map [|tstats count where index=* BY _time span=1d
| eval date_mday=strftime(_time, "%d")
| where date_mday <= tonumber($date_mday$)
| timechart span=1mon sum(count) AS count
| fillnull count value=0
| eval _time = _time + 24 * ($date_mday$ - 1) * 60 * 60]

View solution in original post

Path Finder

Hi @woodcock

Thank you very much for your help.
As my data is duplicated you should do a cleanup before, it generates some questions about how to do it then use tstat or create a data model, use tstat and then a dedup ...

| makeresults 
| eval date_mday=strftime(now(), "%d") 
| map 
    [| tstats summariesonly=t allow_old_summaries=t count from datamodel="eventsPDN" where (nodename="allEventsPDN") by "allEventsPDN.CDCLAVE" "allEventsPDN.VALOR" _time span=1d 
    | stats count by _time "allEventsPDN.CDCLAVE" "allEventsPDN.VALOR" 
    | eval date_mday=strftime(_time, "%d") 
    | where date_mday <= tonumber($date_mday$) 
    | timechart span=1mon sum(count) AS count 
    | fillnull count value=0 
    | eval _time = _time + 24 * ($date_mday$ - 1) * 60 * 60]

It took 500 seconds to give the result ...

0 Karma

Esteemed Legend

I completely do not understand your comment. Does it work or not? Why are you doing a tstats followed by a stats? Why do you think that I might speak Spanish (hablo poquito, pero....)?

0 Karma

Path Finder

Create a data model called "eventsPDN", however, there is duplicate data. I have to leave the events that match the CDCLAVE and VALUE field before counting ...

0 Karma