Solved: How do I remove data read anomalies?

talbot7 · ‎07-03-2012

Reading a temperature sensor (DS18B20) from out side. Every so often I get a bad data set.

Jul  2 23:26:40 malakoff logger: uuid=28:B4:5B:E8:3:0:0:91 temp_c=12.25 temp_f=54.05
Jul  2 23:21:33 malakoff logger: uuid=28:B4:5B:E8:3:0:0:91 temp_c=12.13 temp_f=53.82
Jul  2 23:11:20 malakoff logger: uuid=28:B4:5B:E8:3:0:0:91 temp_c=12.19 temp_f=53.94
Jul  2 23:06:15 malakoff logger: uuid=28:B4:5B:E8:3:0:0:91 temp_c=12.63 temp_f=54.72
Jul  2 23:01:06 malakoff logger: uuid=28:B4:5B:E8:3:0:0:91 temp_c=12.06 temp_f=53.71
Jul  2 22:50:56 malakoff logger: uuid=28:B4:5B:E8:3:0:0:91 temp_c=34.06 temp_f=93.31
Jul  2 22:45:46 malakoff logger: uuid=28:B4:5B:E8:3:0:0:91 temp_c=12.63 temp_f=54.72
Jul  2 22:35:33 malakoff logger: uuid=28:B4:5B:E8:3:0:0:91 temp_c=12.75 temp_f=54.95

I am 100% sure the outside temperature did not jump from 54f to 93f in a 5 min span. Ideas how to filter out these anomalies?

Ayn · ‎07-03-2012

You could create an average, or better yet, a median of the temperature readings using eventstats and then filter out results by comparing each result to that median and throw away any readings that deviate too much from that. For instance, to filter out events with a deviation of 20 degrees F or more from the median:

... | eventstats median(temp_f) as temp_median | where abs(temp_f-temp_median)<20

If you want to create a median for a limited timespan, because temperatures will vary greatly when searching over a whole day for instance, and you also want to do this per individual sensor, you could do:

... | bucket _time span=30m | eventstats median(temp_f) as temp_median by _time,uuid | ...

View solution in original post

Ayn · ‎07-03-2012

You could create an average, or better yet, a median of the temperature readings using eventstats and then filter out results by comparing each result to that median and throw away any readings that deviate too much from that. For instance, to filter out events with a deviation of 20 degrees F or more from the median:

... | eventstats median(temp_f) as temp_median | where abs(temp_f-temp_median)<20

If you want to create a median for a limited timespan, because temperatures will vary greatly when searching over a whole day for instance, and you also want to do this per individual sensor, you could do:

... | bucket _time span=30m | eventstats median(temp_f) as temp_median by _time,uuid | ...

talbot7 · ‎07-05-2012

I think we got it!!!

... | bucket _time span=16m | eventstats stdev(temp_f) as temp_stdev by _time,uuid | where abs(temp_stdev)<1 | timechart span="16m" avg(temp_f) by uuid

Ayn · ‎07-03-2012

Ah, so you want to calculate the median on individual 30 minute spans? You could do this:

... | bucket _time span=30m | eventstats median(temp_f) as temp_median by _time,uuid | ...

This will create a median for each 30 minute span instead of for the whole period you're searching.

talbot7 · ‎07-03-2012

It sent me in the right direction:
eventstats median(temp_f) as temp_median by uuid | where abs(temp_f-temp_median)<2

How do I get as subsearch to function over a window of 30min, and still have my search run for 24 hours. Is that even possible?

where abs(temp_f-temp_median)<2 span=30m

Ayn · ‎07-03-2012

Just take the average by sensor, eventstats median(temp_f) as temp_median by uuid. As for how to calculate the deviation, if you want to do it in some other way by all means do so. My answer was mostly meant as something to get you going.

talbot7 · ‎07-03-2012

Close. That works when monitoring 1 sensor, for a short period of time. When the scope is expanded to 24 hours, with both indoor, and outdoor sensors, it cuts out lots of correct data. Any other ideas?

How do I remove data read anomalies?

Easily Improve Agent Saturation with the Splunk Add-on for OpenTelemetry Collector

Explore the Latest Educational Offerings from Splunk [January 2025 Updates]

Developer Spotlight with Paul Stout