Splunk Search

Summary Index: Difference between sichart and sitimechart

cpeteman
Contributor

Short general question. It seems that they are just the summary index version of the normal commands. Are there any additional differences or anything else I should know about? The docs page was a little too brief for me.

cphair
Builder

Don't know if you're still looking for an answer, but yes, they're the summary versions of the regular commands. The key to working with summarized data is to use the same commands to pull the data back out as you did to put them in. So if you used sistats to summarize some data:


index=someindex source=yoursource| sistats avg(Value) max(Value) by host foo

then you should run the analogous command to retrieve it:

index=someindex_summary source=yoursummarysearch | stats avg(Value) max(Value) by host foo

If you try to pull back something in your IMMEDIATE search that you didn't summarize, such as min(Value) in the above example, you might get nonsense results. However, you can pipe that "stats avg(Value) as avg max(Value) as max by host foo" to a new clause like " | stats min(avg) as min by host foo" and get the minimum of the calculated averages, rather than the minimum of the raw values. Because of this behavior, it is important to plan your summary searches carefully so that you save all the data you need to.

Several fields change when you summarize data. The host becomes the server on which the summary search was run, the source becomes the name of the search, and the sourcetype becomes "stash." If you want to keep the original values of those fields, you must either split by them (as I do above with host), or you must save them in your summary search definition (orig_sourcetype=sourcetype). The timestamp of the summarized data is the beginning of the time period you summarized over: if your summary search runs every hour on the hour, then a summary search that runs at 3:17AM for the previous hour (2-3) would save all its events with a timestamp of 2AM. Since summary data has its own timestamp and is generally run over long periods, I've never had a use case for sitimechart--you can just pull back your original data with "stats avg(Value) as avg by _time host" (you don't need to explicitly summarize the _time field, since the summary search will timestamp the new events) and pipe it to "timechart avg(avg) by host." You may find that you have a use case for sitimechart after all, but be aware that it isn't absolutely necessary in order to preserve time data.

Also, be careful that your summary searches do not generate overlapping data--a search's schedule and its timeframe should align, so that a search that runs hourly saves an hour's worth of data. I think the docs cover this, though.

Let me know if you have further questions.

Get Updates on the Splunk Community!

Harnessing Splunk’s Federated Search for Amazon S3

Managing your data effectively often means balancing performance, costs, and compliance. Splunk’s Federated ...

Infographic provides the TL;DR for the 2024 Splunk Career Impact Report

We’ve been buzzing with excitement about the recent validation of Splunk Education! The 2024 Splunk Career ...

Enterprise Security Content Update (ESCU) | New Releases

In December, the Splunk Threat Research Team had 1 release of new security content via the Enterprise Security ...