Getting Data In

Why are my macros and search not properly filtering out data collected on weekend days from my indexes?

brutecat
Path Finder

I have a problem with a requirement to remove data collected on weekend days from my indexes. I can do this accurately, but when I run a calculation over the resultant set, it seems to continue to include data from these days as 'zeros'. Here is the search string:

index=edistats  RealName="ShipmentOpen"  | `isWeekDay(_time)` | where weekDay < 6 AND weekDay > 0  | timechart span=1d avg(Elapsed) as average | `lineartrend(_time,average)` | timechart max(average)  as mean sum(newY) as regression

The included macros are:

isWeekDay:

eval weekDay = strftime($time$,"%w")

and lineartrend:

eventstats count as numevents sum($x$) as sumX sum($y$) as sumY sum(eval($x$*$y$)) as sumXY sum(eval($x$*$x$)) as sumX2 sum(eval($y$*$y$)) as sumY2 | eval slope=((numevents*sumXY)-(sumX*sumY))/((numevents*sumX2)-(sumX*sumX)) | eval yintercept= ((sumY*sumX2)-(sumX*sumXY))/((numevents*sumX2)-(sumX*sumX))| eval newY=(yintercept + (slope*$x$)) | eval R=((numevents*sumXY) - (sumX*sumY))/sqrt(((numevents*sumX2)-(sumX*sumX))* ((numevents*sumY2)-(sumY*sumY))) | eval R2=R*R

The first macro does the job and comes up with the days I want to drop, however, if I do this, the fitted linear regression is lower than what it should be. This is clear from the visual display as the raw trend is mostly flat.

Do I need to do something in the lineartrend macro to ensure it avoids these dropped days?

Thanks,

Stan

Tags (3)
0 Karma
1 Solution

woodcock
Esteemed Legend

timechart always adds 0-values, but stats does not so do it like this:

index=edistats RealName="ShipmentOpen" | `isWeekDay(_time)` | where weekDay < 6 AND weekDay > 0 | bucket _time span=1d | stats avg(Elapsed) as average by _time | `lineartrend(_time,average)` | stats max(average) as mean sum(newY) as regression BY _time

View solution in original post

0 Karma

woodcock
Esteemed Legend

timechart always adds 0-values, but stats does not so do it like this:

index=edistats RealName="ShipmentOpen" | `isWeekDay(_time)` | where weekDay < 6 AND weekDay > 0 | bucket _time span=1d | stats avg(Elapsed) as average by _time | `lineartrend(_time,average)` | stats max(average) as mean sum(newY) as regression BY _time
0 Karma

brutecat
Path Finder

MuS,

Thanks for that. Perfect!

@woodcock, it's hard to explain without a visual. Here is a PDF of the plots:

https://app.box.com/s/ub1axvu0yz3t6z2wvxl51h6g8zew5lwz

The top one does not have the value:

            <option name="charting.chart.nullValueMode">connect</option>

Once this is added, the outcome is like the bottom one.

Thanks for all your help,

Stan

0 Karma

brutecat
Path Finder

woodcock,

Thanks so much. This computes correctly now. My search string is now:

index=edistats RealName="ShipmentOpen" | isWeekDay(_time) | where weekDay < 6 AND weekDay > 0 | bucket _time span=1d | stats avg(Elapsed) as average by _time | lineartrend(_time,average) | timechart max(average) as mean sum(newY) as regression

I want to see the linear fit superimposed on the raw data. The raw plot is done as columns. However, now that the blank days are left out, the linear trend is not drawn contiguously, but rather as a set of segments. Is there a charting option where they can be linked as a single line?

Thanks

0 Karma

MuS
Legend

woodcock
Esteemed Legend

I do not understand your question but if you need more help, work on clarifying it and post a new question.

0 Karma
Get Updates on the Splunk Community!

Introducing Splunk Enterprise 9.2

WATCH HERE! Watch this Tech Talk to learn about the latest features and enhancements shipped in the new Splunk ...

Adoption of RUM and APM at Splunk

    Unleash the power of Splunk Observability   Watch Now In this can't miss Tech Talk! The Splunk Growth ...

Routing logs with Splunk OTel Collector for Kubernetes

The Splunk Distribution of the OpenTelemetry (OTel) Collector is a product that provides a way to ingest ...