Splunk Search

Predicting when sourcetypes will send data - map command too slow

jorjiana88
Path Finder

Hi,

I have hundreds of sourcetypes and the intervals when sourcetypes are sending data are not realtime, some are sending weekly, some are sending every few hours, some have other weird patterns. I want to predict when sourcetypes will send data, so I tried predict command and used ML Toolkit, unfortunately the query is extremely slow because of the map command. 

Why is it so slow? The base query, taking the list of sourcetypes from a csv is extremely fast, under 1 second, and the query that comes after map, if I run it by itself it takes 4 seconds when I run for a specific sourcetype, so running for each sourcetype should not be so slow.  However when running them together with map command, it does not run, it is incredibly slow.

Is there a way to make this faster? I tried without map, but joins were also problematic because the subquery that does the prediction does not work with more than 1 sourcetypes, so my other attempts to join the data has failed.

 

| inputlookup sourcetypes.csv 
| dedup sourcetype | table sourcetype
| map 
[ search index="_internal" source="*metrics.log" group="per_sourcetype_thruput" series =$sourcetype$
| head 10     
| stats count as counts_data_indexed by _time series 
| predict counts_data_indexed as predicted_counts_data_indexed algorithm=LLP5 holdback=0 future_timespan=1 upper0=upper0 lower0=lower0 
| table sourcetype series _time counts_data_indexed predicted_counts_data_indexed 
| stats max(_time) as next_predicted_time last(series) as sourcetype
| convert timeformat="%Y-%m-%d %H:%M:%S.%3N" ctime(next_predicted_time) ]

 

Now even if I change the first query and only have a sourcetype, the map command will still take forever.

 

--I am running this for 30 days

| makeresults  
| eval sourcetype=splunkd
| dedup sourcetype 
| map 
    [ search index="_internal" source="*metrics.log" group="per_sourcetype_thruput" series=$sourcetype$ 
    | head 20 
    | stats count as counts_data_indexed by _time series 
    | predict counts_data_indexed as predicted_counts_data_indexed algorithm=LLP5 holdback=0 future_timespan=1 upper0=upper0 lower0=lower0 
    | table sourcetype series _time counts_data_indexed predicted_counts_data_indexed 
    | stats max(_time) as next_predicted_time last(series) as sourcetype
    | convert timeformat="%Y-%m-%d %H:%M:%S.%3N" ctime(next_predicted_time)]

 

 

 

Ps: If I manage to do this query , will be able to make many useful queries and alerts based on predicted volume, if a source is sending more or less than usual, determining if a sourcetype has stopped sending (by comparing the prediction), there are many use cases that I can use in the future. 

0 Karma

to4kawa
Ultra Champion

index="_internal" source="*metrics.log" group="per_sourcetype_thruput" series=*
| stats count as counts_data_indexed by _time series
| predict counts_data_indexed as predicted_counts_data_indexed algorithm=LLP5 holdback=0 future_timespan=1 upper0=upper0 lower0=lower0
| table sourcetype series _time counts_data_indexed predicted_counts_data_indexed
| stats max(_time) as next_predicted_time by series
| convert timeformat="%Y-%m-%d %H:%M:%S.%3N" ctime(next_predicted_time)

jorjiana88
Path Finder

Thanks a lot, but this query does not predict well when you add multiple sourcetypes. If you have multiple sourcetypes it does not really predict by splitting by source, it predicts based on all sourcetypes.

I guess it has to do with the fact that the predict command usually requires a timechart before it, so it takes as input.. time and some other numeric value. I fooled it and used stats instead of timechart so that I get the sourcetype too, but it cannot split by sourcetype.

0 Karma

to4kawa
Ultra Champion

predict can predict 5 fields at the time.

No matter how many source types there are, it's better to do a static search.

You could make 20 reports, right?


0 Karma
Get Updates on the Splunk Community!

What's new in Splunk Cloud Platform 9.1.2312?

Hi Splunky people! We are excited to share the newest updates in Splunk Cloud Platform 9.1.2312! Analysts can ...

What’s New in Splunk Security Essentials 3.8.0?

Splunk Security Essentials (SSE) is an app that can amplify the power of your existing Splunk Cloud Platform, ...

Let’s Get You Certified – Vegas-Style at .conf24

Are you ready to level up your Splunk game? Then, let’s get you certified live at .conf24 – our annual user ...