I have a search that returns a large number of series of data to be displayed/analyzed easily. These series show three distinct patterns:
I want to then search according to each pattern. This falls into pattern recognition, but for my purposes, a simple method to identify the flat beginning is good enough. In other words, I only need to "search those with flat beginning greater than 11", "search those with flat beginning of 0", and "search those that are neither." Is there a simple method to do this?
Assume
yoursearchhere
| timechart count by ID
And you want to analyze the first 11 time periods reported by timechart, then do this
yoursearchhere
| bin span=1h _time
| stats count by _time ID
| appendpipe [ head 11
| stats stddev(count) as sdev avg(count) as avg by ID
| eval pattern=case(avg<.1,"Zero at beginning",
sdev < .25,"Flat at beginning",
1==1,"Other")
| fields ID pattern ]
| stats first(pattern) first(count) by _time ID
I think this will give you a starting point. The appendpipe
takes a copy of the data at that point in the execution pipeline, processes it and appends the results to the main pipeline. Oh, and I set the time interval to hours in the bin
command - you could do this using timechart
as you started, but I think it is easier to use bin
and stats
.
Assume
yoursearchhere
| timechart count by ID
And you want to analyze the first 11 time periods reported by timechart, then do this
yoursearchhere
| bin span=1h _time
| stats count by _time ID
| appendpipe [ head 11
| stats stddev(count) as sdev avg(count) as avg by ID
| eval pattern=case(avg<.1,"Zero at beginning",
sdev < .25,"Flat at beginning",
1==1,"Other")
| fields ID pattern ]
| stats first(pattern) first(count) by _time ID
I think this will give you a starting point. The appendpipe
takes a copy of the data at that point in the execution pipeline, processes it and appends the results to the main pipeline. Oh, and I set the time interval to hours in the bin
command - you could do this using timechart
as you started, but I think it is easier to use bin
and stats
.
"head" and "appendpipe" (and first()) are what I missed. Thanks! (Now I want even more commands to zoom in any given internal:-)
I just realize that "count by _time ID" does not give out 0 for missing values at the two ends. (Strangely, timechart always does.) I even tried fillnull to no avail. (I know this was encountered in another question, but fillnull seemed to have solved the problem.) Ideas?
Bah! I guess the timechart solution is better:
yoursearchhere
| timechart count by ID
| untable _time ID count
then appendpipe
etc as before
HTH!
Thanks, @lguinn. untable
is such a handy command! I had previously asked about filling leading zeros, and got a slim but still lengthy method. (My memory lapsed when I said straight fillnull
had worked. It hadn't.) Will test in other use cases.
Note after untable
, head will only return the number of events as in total, and not on a per ID basis. This is undesired. (For one, there could be more than 11 IDs.) So untable
should be performed after head
inside appendpipe
. With this adjustment, and adding max() to criteria, I can use the following to group my IDs:
yoursearchhere
| timechart count by ID
| appendpipe [ head 11
| untable _time ID count
| stats stdev(count) as sdev max(count) as max by ID
| eval pattern=case(max==0,"Zero at beginning",
max>0 and sdev < .25,"Flat at beginning",
1==1,"No head pattern")
| fields ID pattern ]
| stats dc(ID) as Count by pattern
Now, there is a tail pattern in my search, whereby some IDs disappears in the final time periods. When I tried to use the same untable
technique, using tail
in place of head
, I got no IDs in. I'll submit as a new question for that one.
I realize that the original question miss this info: The illustrated time pattern is produced by
| timechart count by ID
Some IDs fall into 1, some fall into 2, some fall into 3.
You could try using multiple functions in your timechart command, along with some | where
clauses. If you use the stdev function then you'll be able to detect the flat lines (since stdev would be 0). Take a look at: http://docs.splunk.com/Documentation/Splunk/6.2.5/SearchReference/CommonStatsFunctions
Getting stdev is easy. The problem is to search based on 0-stdev in a sub period of the total search, because it is not 0 in the entire search period (in which case I can use an eventstat to identify them).