Splunk Search

delete fault submitted events, including avg timechart delete

CMEOGNAD
Engager

Hi Community,

i have a data source, that submit sometimes faulty humidity data like 3302.4 Percent.

To clean / delete this outlier events, i buil a timechart avg to get the real humidity curve, and from this curve i get the max and min with stats  to get the upper and bottom from this curves.

...but my search wont work, and i need your help, here is a makeresult sample:

| makeresults format=json data="[{\"_time\":\"1729115947\", \"humidity\":70.7},{\"_time\":\"1729115887\", \"humidity\":70.6},{\"_time\":\"1729115827\", \"humidity\":70.5},{\"_time\":\"1729115762\", \"humidity\":30.9},{\"_time\":\"1729115707\", \"humidity\":70.6}]"

[ search
| timechart eval(round(avg(humidity),1)) AS avg_humidity
| stats min(avg_humidity) as min_avg_humidity
]

| where humidity < min_avg_humidity ```| delete ```

Labels (4)
0 Karma

CMEOGNAD
Engager

Hi Community,

i solved the problem with outlier detection... thx all for your support 🙂

source="/var/log/livingroom.json"
| streamstats window=60 current=true avg("temperature_celsius") as avg stdev("temperature_celsius") as stdev
| eval lowerBound=(avg-stdev*exact(5)), upperBound=(avg+stdev*exact(5))
| eval isOutlier=if('temperature_celsius' < lowerBound OR 'temperature_celsius' > upperBound, 1, 0)
| where isOutlier=0
| timechart eval(round(avg('temperature_celsius'),1)) AS "Temperature"

 

0 Karma

PickleRick
SplunkTrust
SplunkTrust

It is something that should rather be handled during the ingestion phase - clean your data before indexing.

0 Karma

CMEOGNAD
Engager

CMEOGNAD_0-1729171306959.png

Here is a actual  problem sample... good to see by outlier with -9.4

0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

You might be better off using eventstats to add the average to all the events, then use the where command to keep the events you want to delete, then remove the average field (with the fields command) before deleting the events.

0 Karma

CMEOGNAD
Engager

Do you have a SPL Code hint for me? 😉

0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

It is not clear what it is you actually want. For example, do you want an hourly average of the humidity for each hour, then the minimum and maximum average for that hour over your full time period, then discount events which are outside the minimum and maximum average for the hour they are taken. Or do you want the average for the day and then the minimum and maximum over the full time period and discount events which are outside this daily average. Or do you want the average over the whole time period and discount values which are more than a specified distance from the average. All of these would have different SPL. Please explain what you are trying to do in non-SPL terms.

0 Karma

CMEOGNAD
Engager

If you look at the sceenshot... my goal is to get the values between the 10 and 20, but ignore / delete the outlier with -9,4 (faulty value from sensor), which is to see on the absolute min max graph... in the timechart you see that as litte edge...

The sensor puts every minute a value 24/7.
With "timechart" curve you wont see the small number of outlier by 1440 values, but in the "stats" min max per day, you will see extreme values like the -9,4, which is absolutly unlogical by a minimum average from ~10.


In order to know which of the Min Max of the day is wrong / outlier, I came up with the idea of ​​verifying the values ​​using the timechart Min Max, that was my idea... I hope it is understandable 😉

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @CMEOGNAD ,

at first, I suppose that you know that you must have the can_delete role associated to your user.

Then, I suppose that you know that this is a logical not a physical removing, in other words, removed events are marked as deleted but not removed from the buckets until the end of the bucket life cycle.

In other words you don't have any useful effect to the removing in terms of storage or license (because they are already indexed).

Anyway, I'm not sure that's possible to apply the delete command to a streaming command: you should select the events to delete and use the delete command after the main search.

Ciao.

Giuseppe

0 Karma

CMEOGNAD
Engager

Hi,

delete ist not a mus have... to exclude the vaulty results to the search is another option...

My logig: timechart avg > get the avg min and avg max from this timechart > exclude events with the min max avg > new timechart

0 Karma
Get Updates on the Splunk Community!

Monitoring MariaDB and MySQL

In a previous post, we explored monitoring PostgreSQL and general best practices around which metrics to ...

Financial Services Industry Use Cases, ITSI Best Practices, and More New Articles ...

Splunk Lantern is a Splunk customer success center that provides advice from Splunk experts on valuable data ...

Splunk Federated Analytics for Amazon Security Lake

Thursday, November 21, 2024  |  11AM PT / 2PM ET Register Now Join our session to see the technical ...