Splunk Search

Extracting stats from different fields in the same series of events

Communicator

I've got a series of events with a timestamp and two numbers, like so:

"2011-05-29 22:54:06",68,31
"2011-08-15 10:20:33",143,76
"2011-09-15 10:56:09",63,27
"2011-09-20 20:32:15",0,0
"2011-08-20 09:23:19",0,3

The two numbers represent "success" and "failure" counts for a specific event. What I'd like to be able to sort out are stats for each of the numeric series as well as the ratio between success/failure over time.

  • Average, min, max, stddev and counts over time for "success".
  • Average, min, max, stddev and counts over time for "failure"
  • Success/failure ratio over time.

I've been banging away on this for some time, but I don't seem to be able to extract two numeric series from the same sequence of events. Am I running into a known limit?

http://docs.splunk.com/Documentation/Splunk/4.3/User/ReportOfMultipleDataSeries

If there's no way to do what I'm after with the data in the current format, would I be better off restructuing the data to make it easier to work with using eval()?

"2011-05-29 22:54:06",success,68
"2011-08-15 10:20:33",success,143
"2011-09-15 10:56:09",success,63
"2011-09-20 20:32:15",success,0
"2011-08-20 09:23:19",success,0
"2011-05-29 22:54:06",failure,31
"2011-08-15 10:20:33",failure,76
"2011-09-15 10:56:09",failure,27
"2011-09-20 20:32:15",failure,0
"2011-08-20 09:23:19",failure,3

The example above is a simple case - two fields with numbers I'd like to trend and compare - and is just a starting point. I have more complex requirements but have to get the basics down before tackling anything harder.

Thanks for any help or suggestions.

Tags (2)
0 Karma

Communicator

Thanks for suggesting I show some sample output. I've reworked my fictitious input (I've got a million rows of real input ready to go) to make it easier to see what I'm after:

"2011-09-20 20:32:15",0,0
"2011-08-20 09:23:19",0,3
"2011-09-15 10:56:09",3,3
"2011-08-15 10:20:33",4,7
"2011-05-29 22:54:06",5,8

The two series are

success 0,0,3,4,5
failure 0,3,3,7,8

I'm trying to get counts, avgs and so on of the values in each series. From the samples above, the results about to be like this:

series  value   frequency (count)
success 0       2
success 3       1
success 4       1
success 5       1

failure 0       1
failure 3       2
failure 7       1
failure 8       1

I've tried timechart and stats - here's a timechart example:

sourcetype="sample_counts"
 | timechart count(success_count), count(failure_count)

The result is a table of data where the counts are identical for both series. I assume that the chart is counting against time. If I add one of the fields like "by success_count", I do no better.

I'm after frequency counts from the two fields/series but don't see how to go about it.

Thanks for your help (and patience.)

0 Karma

Communicator

The numbers are success/failure (match/no-match) but could just as well be weight and height, temperature and humidity or any other pair of values from a sampling point. I'd like to know the frequency of each type of count overall and over time. Does that make more sense?

0 Karma

Path Finder

I'm still confused. Is it safe to say the first value is always the success count, and the second value is always the failure count? Or is there something linking some events to other events that I am missing? Do you care about time or just overall totals and averages in your output? Is the average you want supposed to be an average over some span of time or average per event? Would your example timechart work better with sum(success_count) rather than count?

0 Karma

Path Finder

Would the rex command help you? Something like:

{your search} | rex field=_raw " \d\d:\d\d:\d\d\",(?\d*),(?\d*)" | timechart avg(field2) by field1

0 Karma

Communicator

Thanks for the help, there's not enough room in the comments for an answer so I'll post a new answer.

0 Karma

Path Finder

I'm going back and forth with myself as to whether or not I understand your issue. Would something like "timechart avg(f1), min(f1), max(f1), avg(f2), min(f2), max(f2)" be what you are looking for? You could also throw the ratio in there using an eval.

If I am still way off maybe providing the example output you want for your example input above would help clarify?

0 Karma

Communicator

Thanks for the answer. I don't think that my problem is reading the data as I've got field extractions set up for the two different numeric fields now. The problem is that I can't figure out how to get stats out of two different fields on the same series of events. I've tried counting (etc.) by the first field and I get the same results for both series. I've tried getting stats on one field and then piping to a second stats call on the second field and get no results.

Is there a way to get two series on a chart/table out of two fields in the same group of events?

Thanks

0 Karma