Solved: Re: Summary Indexing (or nice monthly stats from d...

vaijpc · ‎02-02-2011

So I'll explain what I've got, what I want then what I can't make work...

I have lots of log files, they've been indexed fine and i've added nice fields to them. Two sourcetypes, one is of the 'queueing' mechanism and the other the 'results'.

In the 'results' log, I'm most interested in 1 field in particular. Its quite simple, 0 for success, 1 for fail.

I have been doing:

index="somethinghere" sourcetype="*-stat"
| timechart span=1day eval(count(eval(success_indicator="0"))/count(eval(eventtype="queue"))*100) AS Percentage by host

to calculate the 'success' of this system that I'm collecting logs for and provide me with a pretty graph. I've no doubt there's a better way to do this... especially if I want results more than once a month. In particular I assume those eval/count/eval bits are just 'bad'?

Also note that I can't do some trickery similar to "(1-avg(success_indicator))*100" as the total number of 'results' is the same as the total number of 'queue's

That in its own is 'good enough' though, and makes me satisfied with splunk free. However the enterprise version would give me the option of scheduled searches... making summary indexing possible... which means quicker, shinier dashboards with more historical info to compare against! So I figure I might as well see if that can be done. "Why not find a reason a pay for the full version?!"

I've read up as much as I can find on summary indexing in the documentation and answers/forums to see whether I can make this happen. In short, I can't. I can't even get anything written to the summary index without using 'collect'! Even when I've managed that, the stuff in 'collect' hasn't been useful.

I assume what I should try and do is something similar to

index="somethinghere" sourcetype="*-stat"
| sistats count(eval(success_indicator="0")) AS success count(eval(eventtype="queue")) AS queued by host

... so that later on I can just add up the 'count's for days I'm interested in and do my calculations on that?

But then I don't really 'get' the difference between stats and sistats... does the latter just provide more fields to make the summary index more precise? I really don't get why neither are writing to my summary index when I schedule them to. (surely they both should, but with different data)

Does anyone have a link to a more structured walkthrough of a summary indexing problem, similar to that which I'm attempting now? I only seem to be able to find pieces here and there, nothing from start to an uninterrupted finish.

vaijpc · ‎02-03-2011

I've got something going that I think is workable for me.

index="somethinghere" sourcetype="*-stat"
| stats count(eval(success_indicator="0")) AS success count(eval(eventtype="queue")) AS queued by host
| collect marker="successonqueue-hourly"

is scheduled hourly.

I can then run something like

index=summary successonqueue-hourly | stats sum(queued) sum(success)

(or some other operation, such as success/queued*100) over whatever time range I want to get statistics at an hourly resolution.

I have to use the 'collect', checking "use summary indexing" isn't working for me. Is this an Enterprise Trial limitation?

Is the above a stupid way to do this? Is there a better way?

Comments please! 🙂

View solution in original post

Paolo_Prigione · ‎02-03-2011

"sistats" does not use the renamed fields but instead the full representation of what the operation was performing

... stats count(eval(status_type="Redirection"))

in the sumindex would become the field=value pair:

"psrsvd_ct_eval(status_type="Redirection")=134"

plus some other auxiliary info fields.

So to extract the data, you should run:

index=summary successonqueue-hourly 
| stats count(eval(success_indicator="0")) as success count(eval(eventtype="queue")) AS queued by host

You give stats the exact same command given to sistats, splunk will take care of getting the proper fields for you.

vaijpc · ‎02-03-2011

That doesn't work, all values in my table become '0'. Though yes, I only get "host", "success" and "queued" columns.

vaijpc · ‎02-03-2011

I've got something going that I think is workable for me.

index="somethinghere" sourcetype="*-stat"
| stats count(eval(success_indicator="0")) AS success count(eval(eventtype="queue")) AS queued by host
| collect marker="successonqueue-hourly"

is scheduled hourly.

I can then run something like

index=summary successonqueue-hourly | stats sum(queued) sum(success)

(or some other operation, such as success/queued*100) over whatever time range I want to get statistics at an hourly resolution.

I have to use the 'collect', checking "use summary indexing" isn't working for me. Is this an Enterprise Trial limitation?

Is the above a stupid way to do this? Is there a better way?

Comments please! 🙂

Paolo_Prigione · ‎02-02-2011

You have to use "collect" or "sumindex" to write to the summary index. Sistats computes more stats in case you want to further aggregate summaryindexed data, but does not store data in the summary index.

I'd strongly suggest you the "marker=..." option, which will simply write the assigned value in the summarized events. In case you summarize different data at different time resolutions, this will help you easily find the data you need between the rest.

Here's an idea for your search:

index="somethinghere" sourcetype="*-stat" 
| bucket _time span=1h
| sistats count(eval(success_indicator="0")) AS success count(eval(eventtype="queue")) AS queued by _time host
| collect marker="successonqueue-hourly"

Paolo_Prigione · ‎02-03-2011

The enterprise trial should work fine. Did you set the search to always perform actions?

vaijpc · ‎02-03-2011

I also don't understand how I get my data back into a usable form Paolo... reading the link above, it suggests I must do ... | stats . I do this and get nothing. If I look at the events in the summary index just normally, I see nasty fields such as psrsvd_ct_eval(eventtype="poll_queue") with the wrong data in there anyway... I don't understand how to retrieve this data in a useful way and no examples are working.

vaijpc · ‎02-03-2011

Broken link above, should be http://www.splunk.com/base/Documentation/latest/Knowledge/Usesummaryindexing

vaijpc · ‎02-03-2011

I'm currently using the 'enterprise trial'. Does checking the checkbox for summary indexing not work in this version?

gkanapathy · ‎02-02-2011

Also, with the implicit/checkbox collect you don't need a "marker" as much, as you should be able to get back the results using index=mysummaryindex source="mysummarizingsearchname" rather than index=mysummaryindex marker=mymarker.

gkanapathy · ‎02-02-2011

collect is not required. When you click the checkbox in the saved search manager screen, it implicitly pipes the results of the specified search into collect. It also gives you some optional parameters that would be passed into the implicit collect. If you have collect in your query, you should not check the box. I guess if you're using the Free version, you can't select that option though...

vaijpc · ‎02-02-2011

Right... ok that makes a bit of sense except... why doesn't http://splunk.com/base/Documentation/latest/Knowledge/… mention that 'collect' is required? It just says "Under Alert conditions, select a Perform actions value of always & select Enable summary indexing." from what I can see? Good suggestion about the marker thanks. I assume that's similar/identical to the "add fields" bit at the bottom of the scheduled searches ui?

Summary Indexing (or nice monthly stats from data)

Earn a $35 Gift Card for Answering our Splunk Admins & App Developer Survey

Continuing Innovation & New Integrations Unlock Full Stack Observability For Your ...

Monitoring Amazon Elastic Kubernetes Service (EKS)