Knowledge Management

Summary Indexing - Aggregating Data & Charting Multiple Ways

SplunkMonster
Engager

So right now I have a summary index that is being populated by the following command:

earliest=-20m  latest=-5m | bucket _time span=5m | sistats count by _time host sourcetype index

The idea is that I'd like to have a record of how many new events were added for each host/sourcetype/index in five minute increments. I'm running that every 15 minutes, using the time window specified (going back -20/-5 in case anything is a little slow to be indexed - I care more about the time the event was generated than when it made it into the index in this case). I'd like to be able to take this data and do a number of different things with it, and I'm wondering what's doable and what isn't:

  • I'd like to be able to sum counts together using the stored values in the summary index to give me a count of events for each host, or sourcetype, or index, over various spans of time. So I'd like to be able to say, using that stored data, tell me how many events were generated for each sourcetype for the last week. Or each index, or each host, etc. Is this doable using the way I'm storing the data above?

  • I'd like the ability to timechart against the stored summarized data, too. So I'd like to be able to create a timechart showing me counts per sourcetype over time (or index, or host, etc.). Is this doable given how the data is being stored above, especially since I'm not using sitimechart to store it?

The main reason I'm asking is I don't have the best grasp of what is doable and what is not via the sistats/sitimechart/etc. commands, and the best way of populating summary indexes that will give me the flexibility I need when it comes time to report on that data.

0 Karma

somesoni2
Revered Legend

With search you're executing, _time value will get stored in the summary index

_time = time of events which will be the time -20 min from the time it is scheduled to run. all the evetns in summary index from one execution will have same _time value

So the things which are doable you asked

  1. Yes, you will be able to sum the counts of the evets for each host/index/sourcetype over various span of time. Beware, the change in _time value so your time range should be appropriate.
  2. Yes, your events in the summary index will be like any other indexed events, with _time value appearing at 15 min interval.

lukejadamec
Super Champion

Summary indexes are about as much fun as is possible with Splunk. It is so fun, that you don't need to ask how to do what you want to do, because testing what works for you is easy for you.

1) Create an index called summaryindexstats-test

2) Create a search that generates the output you want in a table format

3) Schedule the search to run on your interval, and select the Summary Index option at the bottom, and select the summaryindexstats-test index as the summary index.

4) Run this from the commandline on the indexer to backfill the summary index:

splunkhome\bin\splunk cmd python splunkhome\bin\fill_summary_index.py -app appthatcontainsyoursearch -name nameofyoursearch -et -30d -lt now -dedup true -j 8 -index summaryindexstats-test -owner ownerofthesearch -auth owner:ownerspassword

5) Run some searches on the summary index (summaryindexstats-test) and see if it contains the data you need while you marvel at how wicked-fast the searches complete.

6) Delete the index, update your search, and try again. Repeat as necessary until you get what you want.

0 Karma
Get Updates on the Splunk Community!

Automatic Discovery Part 1: What is Automatic Discovery in Splunk Observability Cloud ...

If you’ve ever deployed a new database cluster, spun up a caching layer, or added a load balancer, you know it ...

Real-Time Fraud Detection: How Splunk Dashboards Protect Financial Institutions

Financial fraud isn't slowing down. If anything, it's getting more sophisticated. Account takeovers, credit ...

Splunk + ThousandEyes: Correlate frontend, app, and network data to troubleshoot ...

 Are you tired of troubleshooting delays caused by siloed frontend, application, and network data? We've got a ...