Splunk Search

Summary Indexing (or nice monthly stats from data)

vaijpc
Communicator

So I'll explain what I've got, what I want then what I can't make work...

I have lots of log files, they've been indexed fine and i've added nice fields to them. Two sourcetypes, one is of the 'queueing' mechanism and the other the 'results'.

In the 'results' log, I'm most interested in 1 field in particular. Its quite simple, 0 for success, 1 for fail.

I have been doing:

index="somethinghere" sourcetype="*-stat"
| timechart span=1day eval(count(eval(success_indicator="0"))/count(eval(eventtype="queue"))*100) AS Percentage by host

to calculate the 'success' of this system that I'm collecting logs for and provide me with a pretty graph. I've no doubt there's a better way to do this... especially if I want results more than once a month. In particular I assume those eval/count/eval bits are just 'bad'?

Also note that I can't do some trickery similar to "(1-avg(success_indicator))*100" as the total number of 'results' is the same as the total number of 'queue's

That in its own is 'good enough' though, and makes me satisfied with splunk free. However the enterprise version would give me the option of scheduled searches... making summary indexing possible... which means quicker, shinier dashboards with more historical info to compare against! So I figure I might as well see if that can be done. "Why not find a reason a pay for the full version?!"

I've read up as much as I can find on summary indexing in the documentation and answers/forums to see whether I can make this happen. In short, I can't. I can't even get anything written to the summary index without using 'collect'! Even when I've managed that, the stuff in 'collect' hasn't been useful.

I assume what I should try and do is something similar to

index="somethinghere" sourcetype="*-stat"
| sistats count(eval(success_indicator="0")) AS success count(eval(eventtype="queue")) AS queued by host

... so that later on I can just add up the 'count's for days I'm interested in and do my calculations on that?

But then I don't really 'get' the difference between stats and sistats... does the latter just provide more fields to make the summary index more precise? I really don't get why neither are writing to my summary index when I schedule them to. (surely they both should, but with different data)

Does anyone have a link to a more structured walkthrough of a summary indexing problem, similar to that which I'm attempting now? I only seem to be able to find pieces here and there, nothing from start to an uninterrupted finish.

0 Karma
1 Solution

vaijpc
Communicator

I've got something going that I think is workable for me.

index="somethinghere" sourcetype="*-stat"
| stats count(eval(success_indicator="0")) AS success count(eval(eventtype="queue")) AS queued by host
| collect marker="successonqueue-hourly"

is scheduled hourly.

I can then run something like

index=summary successonqueue-hourly | stats sum(queued) sum(success)

(or some other operation, such as success/queued*100) over whatever time range I want to get statistics at an hourly resolution.

I have to use the 'collect', checking "use summary indexing" isn't working for me. Is this an Enterprise Trial limitation?

Is the above a stupid way to do this? Is there a better way?

Comments please! 🙂

View solution in original post

0 Karma

Paolo_Prigione
Builder

"sistats" does not use the renamed fields but instead the full representation of what the operation was performing

... stats count(eval(status_type="Redirection")) 

in the sumindex would become the field=value pair:

"psrsvd_ct_eval(status_type="Redirection")=134"

plus some other auxiliary info fields.

So to extract the data, you should run:

index=summary successonqueue-hourly 
| stats count(eval(success_indicator="0")) as success count(eval(eventtype="queue")) AS queued by host

You give stats the exact same command given to sistats, splunk will take care of getting the proper fields for you.

0 Karma

vaijpc
Communicator

That doesn't work, all values in my table become '0'. Though yes, I only get "host", "success" and "queued" columns.

vaijpc
Communicator

I've got something going that I think is workable for me.

index="somethinghere" sourcetype="*-stat"
| stats count(eval(success_indicator="0")) AS success count(eval(eventtype="queue")) AS queued by host
| collect marker="successonqueue-hourly"

is scheduled hourly.

I can then run something like

index=summary successonqueue-hourly | stats sum(queued) sum(success)

(or some other operation, such as success/queued*100) over whatever time range I want to get statistics at an hourly resolution.

I have to use the 'collect', checking "use summary indexing" isn't working for me. Is this an Enterprise Trial limitation?

Is the above a stupid way to do this? Is there a better way?

Comments please! 🙂

0 Karma

Paolo_Prigione
Builder

You have to use "collect" or "sumindex" to write to the summary index. Sistats computes more stats in case you want to further aggregate summaryindexed data, but does not store data in the summary index.

I'd strongly suggest you the "marker=..." option, which will simply write the assigned value in the summarized events. In case you summarize different data at different time resolutions, this will help you easily find the data you need between the rest.

Here's an idea for your search:

index="somethinghere" sourcetype="*-stat" 
| bucket _time span=1h
| sistats count(eval(success_indicator="0")) AS success count(eval(eventtype="queue")) AS queued by _time host
| collect marker="successonqueue-hourly"

Paolo_Prigione
Builder

The enterprise trial should work fine. Did you set the search to always perform actions?

0 Karma

vaijpc
Communicator

I also don't understand how I get my data back into a usable form Paolo... reading the link above, it suggests I must do ... | stats . I do this and get nothing. If I look at the events in the summary index just normally, I see nasty fields such as psrsvd_ct_eval(eventtype="poll_queue") with the wrong data in there anyway... I don't understand how to retrieve this data in a useful way and no examples are working.

0 Karma

vaijpc
Communicator
0 Karma

vaijpc
Communicator

I'm currently using the 'enterprise trial'. Does checking the checkbox for summary indexing not work in this version?

0 Karma

gkanapathy
Splunk Employee
Splunk Employee

Also, with the implicit/checkbox collect you don't need a "marker" as much, as you should be able to get back the results using index=mysummaryindex source="mysummarizingsearchname" rather than index=mysummaryindex marker=mymarker.

gkanapathy
Splunk Employee
Splunk Employee

collect is not required. When you click the checkbox in the saved search manager screen, it implicitly pipes the results of the specified search into collect. It also gives you some optional parameters that would be passed into the implicit collect. If you have collect in your query, you should not check the box. I guess if you're using the Free version, you can't select that option though...

0 Karma

vaijpc
Communicator

Right... ok that makes a bit of sense except... why doesn't http://splunk.com/base/Documentation/latest/Knowledge/… mention that 'collect' is required? It just says "Under Alert conditions, select a Perform actions value of always & select Enable summary indexing." from what I can see? Good suggestion about the marker thanks. I assume that's similar/identical to the "add fields" bit at the bottom of the scheduled searches ui?

0 Karma
Get Updates on the Splunk Community!

Splunk Enterprise Security 8.0.2 Availability: On cloud and On-premise!

A few months ago, we released Splunk Enterprise Security 8.0 for our cloud customers. Today, we are excited to ...

Logs to Metrics

Logs and Metrics Logs are generally unstructured text or structured events emitted by applications and written ...

Developer Spotlight with Paul Stout

Welcome to our very first developer spotlight release series where we'll feature some awesome Splunk ...