Getting Data In

How do I remove data read anomalies?

talbot7
Path Finder

Reading a temperature sensor (DS18B20) from out side. Every so often I get a bad data set.

Jul  2 23:26:40 malakoff logger: uuid=28:B4:5B:E8:3:0:0:91 temp_c=12.25 temp_f=54.05
Jul  2 23:21:33 malakoff logger: uuid=28:B4:5B:E8:3:0:0:91 temp_c=12.13 temp_f=53.82
Jul  2 23:11:20 malakoff logger: uuid=28:B4:5B:E8:3:0:0:91 temp_c=12.19 temp_f=53.94
Jul  2 23:06:15 malakoff logger: uuid=28:B4:5B:E8:3:0:0:91 temp_c=12.63 temp_f=54.72
Jul  2 23:01:06 malakoff logger: uuid=28:B4:5B:E8:3:0:0:91 temp_c=12.06 temp_f=53.71
Jul  2 22:50:56 malakoff logger: uuid=28:B4:5B:E8:3:0:0:91 temp_c=34.06 temp_f=93.31
Jul  2 22:45:46 malakoff logger: uuid=28:B4:5B:E8:3:0:0:91 temp_c=12.63 temp_f=54.72
Jul  2 22:35:33 malakoff logger: uuid=28:B4:5B:E8:3:0:0:91 temp_c=12.75 temp_f=54.95

I am 100% sure the outside temperature did not jump from 54f to 93f in a 5 min span. Ideas how to filter out these anomalies?

0 Karma
1 Solution

Ayn
Legend

You could create an average, or better yet, a median of the temperature readings using eventstats and then filter out results by comparing each result to that median and throw away any readings that deviate too much from that. For instance, to filter out events with a deviation of 20 degrees F or more from the median:

... | eventstats median(temp_f) as temp_median | where abs(temp_f-temp_median)<20

If you want to create a median for a limited timespan, because temperatures will vary greatly when searching over a whole day for instance, and you also want to do this per individual sensor, you could do:

... | bucket _time span=30m | eventstats median(temp_f) as temp_median by _time,uuid | ...

View solution in original post

Ayn
Legend

You could create an average, or better yet, a median of the temperature readings using eventstats and then filter out results by comparing each result to that median and throw away any readings that deviate too much from that. For instance, to filter out events with a deviation of 20 degrees F or more from the median:

... | eventstats median(temp_f) as temp_median | where abs(temp_f-temp_median)<20

If you want to create a median for a limited timespan, because temperatures will vary greatly when searching over a whole day for instance, and you also want to do this per individual sensor, you could do:

... | bucket _time span=30m | eventstats median(temp_f) as temp_median by _time,uuid | ...

talbot7
Path Finder

I think we got it!!!

... | bucket _time span=16m | eventstats stdev(temp_f) as temp_stdev by _time,uuid | where abs(temp_stdev)<1 | timechart span="16m" avg(temp_f) by uuid

0 Karma

Ayn
Legend

Ah, so you want to calculate the median on individual 30 minute spans? You could do this:

... | bucket _time span=30m | eventstats median(temp_f) as temp_median by _time,uuid | ...

This will create a median for each 30 minute span instead of for the whole period you're searching.

0 Karma

talbot7
Path Finder

It sent me in the right direction:
eventstats median(temp_f) as temp_median by uuid | where abs(temp_f-temp_median)<2

How do I get as subsearch to function over a window of 30min, and still have my search run for 24 hours. Is that even possible?

where abs(temp_f-temp_median)<2 span=30m

0 Karma

Ayn
Legend

Just take the average by sensor, eventstats median(temp_f) as temp_median by uuid. As for how to calculate the deviation, if you want to do it in some other way by all means do so. My answer was mostly meant as something to get you going.

0 Karma

talbot7
Path Finder

Close. That works when monitoring 1 sensor, for a short period of time. When the scope is expanded to 24 hours, with both indoor, and outdoor sensors, it cuts out lots of correct data. Any other ideas?

0 Karma
Get Updates on the Splunk Community!

Now Available: Cisco Talos Threat Intelligence Integrations for Splunk Security Cloud ...

At .conf24, we shared that we were in the process of integrating Cisco Talos threat intelligence into Splunk ...

Preparing your Splunk Environment for OpenSSL3

The Splunk platform will transition to OpenSSL version 3 in a future release. Actions are required to prepare ...

Easily Improve Agent Saturation with the Splunk Add-on for OpenTelemetry Collector

Agent Saturation What and Whys In application performance monitoring, saturation is defined as the total load ...