Splunk Search

Summary Indexing (or nice monthly stats from data)

Communicator

So I'll explain what I've got, what I want then what I can't make work...

I have lots of log files, they've been indexed fine and i've added nice fields to them. Two sourcetypes, one is of the 'queueing' mechanism and the other the 'results'.

In the 'results' log, I'm most interested in 1 field in particular. Its quite simple, 0 for success, 1 for fail.

I have been doing:

index="somethinghere" sourcetype="*-stat"
| timechart span=1day eval(count(eval(success_indicator="0"))/count(eval(eventtype="queue"))*100) AS Percentage by host

to calculate the 'success' of this system that I'm collecting logs for and provide me with a pretty graph. I've no doubt there's a better way to do this... especially if I want results more than once a month. In particular I assume those eval/count/eval bits are just 'bad'?

Also note that I can't do some trickery similar to "(1-avg(success_indicator))*100" as the total number of 'results' is the same as the total number of 'queue's

That in its own is 'good enough' though, and makes me satisfied with splunk free. However the enterprise version would give me the option of scheduled searches... making summary indexing possible... which means quicker, shinier dashboards with more historical info to compare against! So I figure I might as well see if that can be done. "Why not find a reason a pay for the full version?!"

I've read up as much as I can find on summary indexing in the documentation and answers/forums to see whether I can make this happen. In short, I can't. I can't even get anything written to the summary index without using 'collect'! Even when I've managed that, the stuff in 'collect' hasn't been useful.

I assume what I should try and do is something similar to

index="somethinghere" sourcetype="*-stat"
| sistats count(eval(success_indicator="0")) AS success count(eval(eventtype="queue")) AS queued by host

... so that later on I can just add up the 'count's for days I'm interested in and do my calculations on that?

But then I don't really 'get' the difference between stats and sistats... does the latter just provide more fields to make the summary index more precise? I really don't get why neither are writing to my summary index when I schedule them to. (surely they both should, but with different data)

Does anyone have a link to a more structured walkthrough of a summary indexing problem, similar to that which I'm attempting now? I only seem to be able to find pieces here and there, nothing from start to an uninterrupted finish.

0 Karma
1 Solution

Communicator

I've got something going that I think is workable for me.

index="somethinghere" sourcetype="*-stat"
| stats count(eval(success_indicator="0")) AS success count(eval(eventtype="queue")) AS queued by host
| collect marker="successonqueue-hourly"

is scheduled hourly.

I can then run something like

index=summary successonqueue-hourly | stats sum(queued) sum(success)

(or some other operation, such as success/queued*100) over whatever time range I want to get statistics at an hourly resolution.

I have to use the 'collect', checking "use summary indexing" isn't working for me. Is this an Enterprise Trial limitation?

Is the above a stupid way to do this? Is there a better way?

Comments please! 🙂

View solution in original post

0 Karma

"sistats" does not use the renamed fields but instead the full representation of what the operation was performing

... stats count(eval(status_type="Redirection")) 

in the sumindex would become the field=value pair:

"psrsvd_ct_eval(status_type="Redirection")=134"

plus some other auxiliary info fields.

So to extract the data, you should run:

index=summary successonqueue-hourly 
| stats count(eval(success_indicator="0")) as success count(eval(eventtype="queue")) AS queued by host

You give stats the exact same command given to sistats, splunk will take care of getting the proper fields for you.

0 Karma

Communicator

That doesn't work, all values in my table become '0'. Though yes, I only get "host", "success" and "queued" columns.

0 Karma

Communicator

I've got something going that I think is workable for me.

index="somethinghere" sourcetype="*-stat"
| stats count(eval(success_indicator="0")) AS success count(eval(eventtype="queue")) AS queued by host
| collect marker="successonqueue-hourly"

is scheduled hourly.

I can then run something like

index=summary successonqueue-hourly | stats sum(queued) sum(success)

(or some other operation, such as success/queued*100) over whatever time range I want to get statistics at an hourly resolution.

I have to use the 'collect', checking "use summary indexing" isn't working for me. Is this an Enterprise Trial limitation?

Is the above a stupid way to do this? Is there a better way?

Comments please! 🙂

View solution in original post

0 Karma

You have to use "collect" or "sumindex" to write to the summary index. Sistats computes more stats in case you want to further aggregate summaryindexed data, but does not store data in the summary index.

I'd strongly suggest you the "marker=..." option, which will simply write the assigned value in the summarized events. In case you summarize different data at different time resolutions, this will help you easily find the data you need between the rest.

Here's an idea for your search:

index="somethinghere" sourcetype="*-stat" 
| bucket _time span=1h
| sistats count(eval(success_indicator="0")) AS success count(eval(eventtype="queue")) AS queued by _time host
| collect marker="successonqueue-hourly"

The enterprise trial should work fine. Did you set the search to always perform actions?

0 Karma

Communicator

I also don't understand how I get my data back into a usable form Paolo... reading the link above, it suggests I must do ... | stats . I do this and get nothing. If I look at the events in the summary index just normally, I see nasty fields such as psrsvd_ct_eval(eventtype="poll_queue") with the wrong data in there anyway... I don't understand how to retrieve this data in a useful way and no examples are working.

0 Karma

Communicator
0 Karma

Communicator

I'm currently using the 'enterprise trial'. Does checking the checkbox for summary indexing not work in this version?

0 Karma

Splunk Employee
Splunk Employee

Also, with the implicit/checkbox collect you don't need a "marker" as much, as you should be able to get back the results using index=mysummaryindex source="mysummarizingsearchname" rather than index=mysummaryindex marker=mymarker.

Splunk Employee
Splunk Employee

collect is not required. When you click the checkbox in the saved search manager screen, it implicitly pipes the results of the specified search into collect. It also gives you some optional parameters that would be passed into the implicit collect. If you have collect in your query, you should not check the box. I guess if you're using the Free version, you can't select that option though...

0 Karma

Communicator

Right... ok that makes a bit of sense except... why doesn't http://splunk.com/base/Documentation/latest/Knowledge/… mention that 'collect' is required? It just says "Under Alert conditions, select a Perform actions value of always & select Enable summary indexing." from what I can see? Good suggestion about the marker thanks. I assume that's similar/identical to the "add fields" bit at the bottom of the scheduled searches ui?

0 Karma
State of Splunk Careers

Access the Splunk Careers Report to see real data that shows how Splunk mastery increases your value and job satisfaction.

Find out what your skills are worth!