Splunk Search

Predicting when sourcetypes will send data - map command too slow

jorjiana88
Path Finder

Hi,

I have hundreds of sourcetypes and the intervals when sourcetypes are sending data are not realtime, some are sending weekly, some are sending every few hours, some have other weird patterns. I want to predict when sourcetypes will send data, so I tried predict command and used ML Toolkit, unfortunately the query is extremely slow because of the map command. 

Why is it so slow? The base query, taking the list of sourcetypes from a csv is extremely fast, under 1 second, and the query that comes after map, if I run it by itself it takes 4 seconds when I run for a specific sourcetype, so running for each sourcetype should not be so slow.  However when running them together with map command, it does not run, it is incredibly slow.

Is there a way to make this faster? I tried without map, but joins were also problematic because the subquery that does the prediction does not work with more than 1 sourcetypes, so my other attempts to join the data has failed.

 

| inputlookup sourcetypes.csv 
| dedup sourcetype | table sourcetype
| map 
[ search index="_internal" source="*metrics.log" group="per_sourcetype_thruput" series =$sourcetype$
| head 10     
| stats count as counts_data_indexed by _time series 
| predict counts_data_indexed as predicted_counts_data_indexed algorithm=LLP5 holdback=0 future_timespan=1 upper0=upper0 lower0=lower0 
| table sourcetype series _time counts_data_indexed predicted_counts_data_indexed 
| stats max(_time) as next_predicted_time last(series) as sourcetype
| convert timeformat="%Y-%m-%d %H:%M:%S.%3N" ctime(next_predicted_time) ]

 

Now even if I change the first query and only have a sourcetype, the map command will still take forever.

 

--I am running this for 30 days

| makeresults  
| eval sourcetype=splunkd
| dedup sourcetype 
| map 
    [ search index="_internal" source="*metrics.log" group="per_sourcetype_thruput" series=$sourcetype$ 
    | head 20 
    | stats count as counts_data_indexed by _time series 
    | predict counts_data_indexed as predicted_counts_data_indexed algorithm=LLP5 holdback=0 future_timespan=1 upper0=upper0 lower0=lower0 
    | table sourcetype series _time counts_data_indexed predicted_counts_data_indexed 
    | stats max(_time) as next_predicted_time last(series) as sourcetype
    | convert timeformat="%Y-%m-%d %H:%M:%S.%3N" ctime(next_predicted_time)]

 

 

 

Ps: If I manage to do this query , will be able to make many useful queries and alerts based on predicted volume, if a source is sending more or less than usual, determining if a sourcetype has stopped sending (by comparing the prediction), there are many use cases that I can use in the future. 

0 Karma

to4kawa
SplunkTrust
SplunkTrust

index="_internal" source="*metrics.log" group="per_sourcetype_thruput" series=*
| stats count as counts_data_indexed by _time series
| predict counts_data_indexed as predicted_counts_data_indexed algorithm=LLP5 holdback=0 future_timespan=1 upper0=upper0 lower0=lower0
| table sourcetype series _time counts_data_indexed predicted_counts_data_indexed
| stats max(_time) as next_predicted_time by series
| convert timeformat="%Y-%m-%d %H:%M:%S.%3N" ctime(next_predicted_time)

jorjiana88
Path Finder

Thanks a lot, but this query does not predict well when you add multiple sourcetypes. If you have multiple sourcetypes it does not really predict by splitting by source, it predicts based on all sourcetypes.

I guess it has to do with the fact that the predict command usually requires a timechart before it, so it takes as input.. time and some other numeric value. I fooled it and used stats instead of timechart so that I get the sourcetype too, but it cannot split by sourcetype.

0 Karma

to4kawa
SplunkTrust
SplunkTrust

predict can predict 5 fields at the time.

No matter how many source types there are, it's better to do a static search.

You could make 20 reports, right?


0 Karma

Tune In & Win!

Don't miss out on your
chance to take home free
prizes by helping our players
save the Splunk Cloudom!

Dungeons & Data
Monsters: Splunk O11y
Day Editions Games
stream live:
5/4 at 6:30pm PST
5/5 at 7:00pm PST
on