Hi Community,
i have a data source, that submit sometimes faulty humidity data like 3302.4 Percent.
To clean / delete this outlier events, i buil a timechart avg to get the real humidity curve, and from this curve i get the max and min with stats  to get the upper and bottom from this curves.
...but my search wont work, and i need your help, here is a makeresult sample:
| makeresults format=json data="[{\"_time\":\"1729115947\", \"humidity\":70.7},{\"_time\":\"1729115887\", \"humidity\":70.6},{\"_time\":\"1729115827\", \"humidity\":70.5},{\"_time\":\"1729115762\", \"humidity\":30.9},{\"_time\":\"1729115707\", \"humidity\":70.6}]"
[ search
| timechart eval(round(avg(humidity),1)) AS avg_humidity
| stats min(avg_humidity) as min_avg_humidity
]
| where humidity < min_avg_humidity ```| delete ```
Hi Community,
i solved the problem with outlier detection... thx all for your support 🙂
source="/var/log/livingroom.json"
| streamstats window=60 current=true avg("temperature_celsius") as avg stdev("temperature_celsius") as stdev
| eval lowerBound=(avg-stdev*exact(5)), upperBound=(avg+stdev*exact(5))
| eval isOutlier=if('temperature_celsius' < lowerBound OR 'temperature_celsius' > upperBound, 1, 0)
| where isOutlier=0
| timechart eval(round(avg('temperature_celsius'),1)) AS "Temperature"
 
		
		
		
		
		
	
			
		
		
			
					
		It is something that should rather be handled during the ingestion phase - clean your data before indexing.
Here is a actual problem sample... good to see by outlier with -9.4
 
		
		
		
		
		
	
			
		
		
			
					
		You might be better off using eventstats to add the average to all the events, then use the where command to keep the events you want to delete, then remove the average field (with the fields command) before deleting the events.
Do you have a SPL Code hint for me? 😉
 
		
		
		
		
		
	
			
		
		
			
					
		It is not clear what it is you actually want. For example, do you want an hourly average of the humidity for each hour, then the minimum and maximum average for that hour over your full time period, then discount events which are outside the minimum and maximum average for the hour they are taken. Or do you want the average for the day and then the minimum and maximum over the full time period and discount events which are outside this daily average. Or do you want the average over the whole time period and discount values which are more than a specified distance from the average. All of these would have different SPL. Please explain what you are trying to do in non-SPL terms.
If you look at the sceenshot... my goal is to get the values between the 10 and 20, but ignore / delete the outlier with -9,4 (faulty value from sensor), which is to see on the absolute min max graph... in the timechart you see that as litte edge...
The sensor puts every minute a value 24/7.
With "timechart" curve you wont see the small number of outlier by 1440 values, but in the "stats" min max per day, you will see extreme values like the -9,4, which is absolutly unlogical by a minimum average from ~10.
In order to know which of the Min Max of the day is wrong / outlier, I came up with the idea of verifying the values using the timechart Min Max, that was my idea... I hope it is understandable 😉
 
					
				
		
 
		
		
		
		
		
	
			
		
		
			
					
		Hi @CMEOGNAD ,
at first, I suppose that you know that you must have the can_delete role associated to your user.
Then, I suppose that you know that this is a logical not a physical removing, in other words, removed events are marked as deleted but not removed from the buckets until the end of the bucket life cycle.
In other words you don't have any useful effect to the removing in terms of storage or license (because they are already indexed).
Anyway, I'm not sure that's possible to apply the delete command to a streaming command: you should select the events to delete and use the delete command after the main search.
Ciao.
Giuseppe
Hi,
delete ist not a mus have... to exclude the vaulty results to the search is another option...
My logig: timechart avg > get the avg min and avg max from this timechart > exclude events with the min max avg > new timechart
