Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Archive

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

Highlighted

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

daskuntal

Path Finder

03-07-2012
04:56 PM

Hi,

I'm trying to create an ALert mechanism that will be triggered if certain condition is "statistically significant".

Let's say

n = Number of Errors/5 min

p = Probability of n<=5

a = The critical alpha level = 5%

Null Hypothesis:

H0: p <= 5%

H1: p > 5%

Let's say I have a set of data with the # of Errors per5 min, for the last 7 days. e.g.

3:36 -- 3

3:41 -- 0

3:46 -- 0

.

.

.

.

4:00 -- 1

4:05 -- 4

Is there a way for me to configure an alert in Splunk, which will check every 5 mins, if the number of Errors occurred in the last 5 mins is statistically significant, in order to trigger the alert?

Thanks

1 Solution

Highlighted
##

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Re: Statistical Analysis using Splunk

daskuntal

Path Finder

03-09-2012
10:37 AM

Thanks for the response. That sounds like a workaround that we can incorporate. It isn't really a statistical analysis such as a T-Test or a Chi-Test. But for now this is better than simply running an alert with a threshold count. SO, thanks for that.

Note:

I am yet to set it up & collect the data first for the 7 days & then prepare the alert.

However I have a few followup questions:

- When you say, don't use "'sistats' count" in the summary search, what do you mean? SHold all Summary Indexing be using "sistats". I understand that when I'm using the Summary Indexed search in another search, then I'll use "stats", instead of "sisstats". Am I correct?
- If I'm not mistaken then the Error Summary should look like "your error condition | sistats count as numErrors"
- The actual alert search should look like this: your error condition | stats count as currentErrors | appendcols [search index=summary search=ErrorSummary earliest=-7d@d latest=@d | stats avg(numErrors) as avgErrors stdev(numErrors) as stdevErrors count as numPeriods | eval maxVal = avgErrors + (stdevErrors * 2) | eval significant = currentErrors > maxVal | fields + currentErrors avgErrors stdevErrors numPeriods significant

Please let me know if I missed anything ...

I'll report back, once I try the whole thing.

THanks again for yoru help

Highlighted
##

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Re: Statistical Analysis using Splunk

lguinn2

Legend

03-12-2012
08:23 AM

Bah - my original answer was poor, and I have realized that there is a simpler solution. This calculates a mean number of errors over the last 7 days, then creates an upper bound that is two standard deviations from the mean. If the number of errors is greater than the upper bound, the number is statistically significant.

Try this, no summary index is needed:

```
your-error-condition | stats count as currentErrors |
appendcols [search your-error-condition earliest=-7d@d lastest=@d |
timechart span=5m count as errorCount |
stats avg(errorCount) as avgErrors std(errorCount) as stdevErrors count as numPeriods |
eval maxVal = avgErrors + (stdevErrors * 2) |
eval significant = currentErrors > maxVal |
fields + currentErrors avgErrors stdevErrors numPeriods significant
```

Note that I am only testing for a statistically significant increase in errors, although using the standard deviation makes this a two-tailed statistical test. There are also a few other statistical functions that might be of interest to you here -

Functions for stats, chart and timechart

Functions for eval and where

These may help you calculate the actual T-test or Chi-Square.

Highlighted
##

## error_summary:

## Alert Search:

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Re: Statistical Analysis using Splunk

daskuntal

Path Finder

03-12-2012
03:08 PM

Thanks. I was thinking in the same line. But I'd still prefer to use Summary Indexing instead of a live search, since to gather data for last 7 days will take a very long time. Hence I'd collect the data for every 5 min interval in the Summary Index, and use a separate search to calculate the statistics using the Summary search....

In order to calculate the exact p-value, we need either

1) Integral function in Splunk, or

2) The ability to lookup the Z-Table in SPlunk. Currently, I'm aware of looking up a 2-dimensional table in Splunk, but in order to lookup a Z-Table, we need to match the row & the column values: Is there any way to do that?

e.g. for each X, Z = (X-avg)/stddev

Lookup the p-value from the Z-score table(I'll save that table as a lookup table in Splunk lookup), and then determine if the p-value < alpha(=.05). If it is, for the 1-tailed test, then we can determine, if the result is significant.

Here's what I came up with:

index=someindex sourcetype=somesource environment!="someenv*" Type=someType (ErrorCode!="null" AND ErrorCode!=200 AND ErrorCode!=0)| sistats count as numErrors by ErrorCode

index=someindex sourcetype=somesource environment!="someenv*" Type=sometype (ErrorCode!="null" AND ErrorCode!=200 AND ErrorCode!=0)

| stats count as currentErrors by ErrorCode

| appendcols [search index=some*summary search=summary*error_alert earliest=-7d@d latest=@d

| stats avg(numErrors) as avgErrors stdev(numErrors) as stdevErrors count as numPeriods by ErrorCode]

| eval zVal = (currentErrors - avgErrors)/stdevErrors/(sqrt(numPeriods)) // e,g, z=2.54

| eval zValFirstDec = substr(zVal, 1, 3) //2.5

| eval zValSecDec = "0.0"+substr(zVal, -1)//0.04

| lookup zTable zValFirstDec, zValSecDec OUTPUT pValue

| eval significant = if(pValue < .05, "true", "false")

| fields + currentErrors avgErrors stdevErrors numPeriods zVal significant

Now the only thing that I'm not sure of is the Splunk lookup for row & column, in the Z-Table, where you look for the first two values:

"lookup zTable zValFirstDec, zValSecDec OUTPUT pValue"

I don't know if and/or how to do that in Splunk =>

A new question for Splunk: Can we do lookup in a Matrix table with two values(Row value & column value)?

Does that look like something we can do?

Highlighted
##

## error_summary:

## Alert Search:

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Re: Statistical Analysis using Splunk

lguinn2

Legend

03-12-2012
05:29 PM

*[Okay, I deleted my original dumb answer, and left the second one. This one is the third.]*

So your searches are now very close, but I don't think that your summary index is going to give you what you want. A summary indexing search that ends with `sistats count as numErrors by ErrorCode`

should be retrieved by the following command

```
index=some_summary search=summary_error_alert earliest=-7d@d latest=@d
| stats count as numErrors by ErrorCode
```

And that isn't going to give you the stats that you need. But you can try it. (Run the error_summary search for an hour or two and then try searching the summary index.)

Here is what I think you need:

```
index=someindex sourcetype=somesource environment!="someenv*" Type=someType (ErrorCode!="null" AND
ErrorCode!=200 AND ErrorCode!=0 ) | eval counter=1 |
sitimechart span=5m sum(counter) by ErrorCode
```

The timechart command will summarize based on the 5-minute intervals and return a series. The stats command will summarize overall and return a single value. I think you need the series.

```
index=someindex sourcetype=somesource environment!="someenv*" Type=sometype (ErrorCode!="null" AND ErrorCode!=200 AND ErrorCode!=0)
| stats count as currentErrors by ErrorCode
| appendcols [search index=some_summary search=summary_error_alert earliest=-7d@d latest=@d
| timechart span=5m sum(counter) as numErrors by ErrorCode
| stats count as numPeriods avg(numErrors) as avgErrors stdev(numErrors) as stdevErrors by Errorcode ]
| eval zVal = (currentErrors - avgErrors)/stdevErrors/(sqrt(numPeriods)) // e,g, z=2.54
| eval zValFirstDec = substr(zVal, 1, 3) //2.5
| eval zValSecDec = "0.0"+substr(zVal, -1)//0.04
| lookup zTable zValFirstDec, zValSecDec OUTPUT pValue
| eval significant = pValue < .05
| fields + currentErrors avgErrors stdevErrors numPeriods zVal significant pValue
```

**The lookup question:**

You can totally do a multi-key lookup in Splunk, just as you have written it. But the CSV file needs to be in this format:

*zValFirstDec,zValSecDec,pValue*

Not in a matrix format.

Highlighted
##

# Results:

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Re: Statistical Analysis using Splunk

daskuntal

Path Finder

03-15-2012
04:45 PM

Thanks Guinn,

We are getting close.... but not there yet. I'm having a hard time collecting data from the summary index within a subsearch.

On inspecting my actual alert search, it appeared that I'm getting 0 results for the subsearch:

search index=some*summary search=summary*error_alert earliest=-7d@d latest=@d

| timechart span=5m sum(counter) as numErrors by ErrorCode

| stats count as numPeriods avg(numErrors) as avgErrors stdev(numErrors) as stdevErrors by Errorcode

However, I validated that my summary search(which runs every 5 mins(earliest=-10m@m & latest=-5m@m) does indeed return data once I set the time range for the last 7 days.

What I noticed is that if I change the search to the following(source=summary*error*alert), instead of search=summary*error*alert), then it returns some data. With that in mind, when I run the following section of the search, it only returns two fields, "ErrorCode" & currentErrrs:

index=someindex sourcetype=somesource environment!="someenv*" Type=sometype (ErrorCode!="null" AND ErrorCode!=200 AND ErrorCode!=0)

| stats count as currentErrors by ErrorCode

| appendcols [search index=some*summary search=summary*error_alert earliest=-7d@d latest=@d

| timechart span=5m sum(counter) as numErrors by ErrorCode

| stats count as numPeriods avg(numErrors) as avgErrors stdev(numErrors) as stdevErrors by Errorcode ]

ErrorCode currentErrors

1 2010 1

2 2011 5

3 2013 6

4 2100 1

5 2102 8

6 2201 7

....

....

I can't put my finger on it, but it's either the timechart/counter values, or the subsearch/summary search, that's screwing things up.

I'm still researching.. if I find the issue, I'll post it. any help is appreciated.

Basically the final result should look something like this:

ErrorCode-currentErrors-avgErrors-stdevErrors-numPeriods-zVal-pValue-significant

2010-1-7.6-12.4-1567-.12-.03-true

2011-5-7.6-12.4-1567-.23-.07-false

I think we need to use eventstats, instead of stats, since the avg, std & numPeriods should be the same value in all the events/rows, and available for all the events/rows, in order to calculate the zVals for each row.

Any help is appreciated..

Kuntal

Highlighted
##

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Re: Statistical Analysis using Splunk

daskuntal

Path Finder

03-15-2012
05:46 PM

Timechart messed things up, since it's result had time as the column header. The following search works as expected. Now I just need to put the subsearch in a summary indexed search for better performance, when gathering large quantities of data:

index=someindex sourcetype=somelogs environment!="PREPROD*" Type=sometype (ErrorCode!="null" AND ErrorCode!=200 AND ErrorCode!=0)
| stats count as currentErrors by ErrorCode
| appendcols [search index=someindex sourcetype=somelogs environment!="PREPROD*" Type=sometype (ErrorCode!="null" AND ErrorCode!=200 AND ErrorCode!=0)

| bucket

| stats count as numErrors, sum(counter) as numErrors by ErrorCode, _time

| stats avg(numErrors) as avgErrors stdev(numErrors) as stdevErrors count as numPeriods by ErrorCode]

| eval zVal = (currentErrors - avgErrors)/stdevErrors/(sqrt(numPeriods))

| eval Z1 = substr(zVal, 1, 3)

| eval Z2 = substr(zVal, 4, 1)

| lookup z

| eval significant = if(pValue < .05, "true", "false")

| fields + ErrorCode currentErrors avgErrors stdevErrors numPeriods zVal pValue significant

Highlighted
##

# Summary Search:

## Run conditions:

## Query:

# Alert Search:

## Alert condition:

## Query:

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Re: Statistical Analysis using Splunk

daskuntal

Path Finder

03-16-2012
02:56 PM

Here's the final answer. I figured out why my summary search wasn't working, it's because instead of search*name="summary*error*alert", we were doing search="summary*error_alert". duh, it took me sometime to compare my other summary searches to figure out what was wrong :).

Nevertheless, here's the final solution. It's not a perfect one, since there are a lot of assumptions:

1) I can't determine the p-value if Z-Score > 4.09, because the table doesn't have data beyond that, & we don't need it either, since beyond that Z-Score, the probability is so low.

2) The Error distribution is Normal

3) I'm avoiding weekends, when gathering summary data, so that my statistics is not skewed, by very low number of errors over the weekends.

4) However, my Alert search runs every day

5) Critical value = 0.05

I run it every 5 mins, and it collects the data between "earliest=-10m@m latest=-5m@m".

index=someindex sourcetype=somelogs environment!="PREPROD*" Type=someType (ErrorCode!="null" AND ErrorCode!=200 AND ErrorCode!=0)

(date*wday!="sunday" AND date*wday!="saturday")

| bucket _time span=5m

| sistats count as numErrors by ErrorCode, _time

Runs every 5 mins

Gathers data from the last 5 mins => earliest=-5m@m latest=now

Alerts on a custom condition, when "significant=true" in the search results

index=someindex sourcetype=somelogs environment!="PREPROD*" Type=someType (ErrorCode!="null" AND ErrorCode!=200 AND ErrorCode!=0)

| stats count as currentErrors by ErrorCode

| appendcols [search index=mobile*summary search*name=summary*error*alert

earliest=-7d@d latest=@d (date*wday!="sunday" AND date*wday!="saturday")

| stats count as numErrors by ErrorCode, *time
| stats avg(numErrors) as avgErrors stdev(numErrors) as stdevErrors count as numPeriods by ErrorCode]
| eval zVal = round(((currentErrors - avgErrors)/stdevErrors/(sqrt(numPeriods))), 2)
| lookup z*val_lookup zVal OUTPUT pValue1Tail

| eval significant = if(zVal > 4.09 OR pValue1Tail < .05, "true", "false")

| fields + ErrorCode currentErrors avgErrors stdevErrors numPeriods zVal pValue pValue1Tail significant

Please let me know if there is any scope for improvement or any issues with the above solution. I'd appreciate any feedback.

Cheers

Highlighted
##

## Correction:

## Alert Search:

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Re: Statistical Analysis using Splunk

daskuntal

Path Finder

03-19-2012
04:32 PM

Instead of appendcols, we need to perform "join ErrorCode", otherwise, the merge is not correct. The appendcols will replace the subsearch results with the main result, without doing any "join", and hence the results will be misleading. Hence you need to join the two results with the common field, "ErrorCode"

Note:

- I've reduced the 2 input Z-Score table with one input Lookup table, hence no need to Z1 & Z2
- Since the Z-Score table doesn't have results beyond Z=4.09, I've also added another check for Z > 4.09.

index=someindex sourcetype=sometype environment!="PREPROD*" Type=someType (ErrorCode!="null" AND ErrorCode!=200 AND ErrorCode!=0)

| stats count as currentErrors by ErrorCode

| join ErrorCode [search index=mobile*summary search*name=summary*error*alert

earliest=-7d@d latest=@d (date*wday!="sunday" AND date*wday!="saturday")

| stats count as numErrors by ErrorCode, *time
| stats avg(numErrors) as avgErrors stdev(numErrors) as stdevErrors count as numPeriods by ErrorCode]
| eval zVal = round(((currentErrors - avgErrors)/stdevErrors), 2)
| lookup z*val_lookup zVal OUTPUT pValue1Tail

| eval significant = if(zVal > 4.09 OR pValue1Tail < .05, "true", "false")

| fields + ErrorCode currentErrors avgErrors stdevErrors numPeriods zVal pValue pValue1Tail significant

Highlighted
##

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Re: Statistical Analysis using Splunk

andrewkenth

Communicator

01-29-2014
08:50 AM

*val*lookup file you used? I'd hate to recreate the wheel.