Re: Summary index or any other alternative which a...

captaincool07 · ‎06-25-2025

Summary index or any alternative

Hi, I have created a dashboard with 8 panels and time frame is last 5 minutes. Kept that shorter time frame booz for this platform we are receiving large chunks of data, App team want this dashboard to The run for longer time frames may be last 7 days. If we are running for last 7 days, search is taking so much time and lot of resources getting wasted. They asked for solution to implement longer time Frame with faster results

I explored and found SUMMARY index as an option but never worked on it. Can this help me?

We have nearly 100+ indexes in that particular platform and sourcetype is same for all. We have RBAC implemented for each index (restricting app A users to view app B logs and viceversa ) Now if I implement Summary Index here,can this RBAC sill take effect because summary index provides data for all indexes and if it used the same in dashboard.. app A user can see app B logs by any chance or set RBAC applies here over summary index? Or else suggest other alternatives as well. At the end it should align with my RBACs created.

LAME-Creations · ‎06-25-2025

You have a few options and each have their own pros and cons and without knowing the data, I can only make an estimated guess on what would work best for you.

Data Acceleration - you could put your data into data models - either existing or custom ones that fit your data and accelerate the data. This will "accelerate" your data which in theory should significantly boost the speed at which you search, Mileage may vary, but often you get orders of magnitude faster searching. The cons with this is that you are probably going to double the size of your "indexed data" because acceleration is keeping your non accelerated logs and putting a set of accelerated data on the index meaning that you will be using more storage space. Additionally, every 5 minutes or so, your accelerated data will be running the search to accelerate your data and that it going to occupy ram and cpu permanently on the box. Plus depending on your comfort building or fitting your data into a datamodel, this is a little labor intensive to set up the first time. As for RBAC, Splunk will maintain the same rbac rules to your accelerated data as exists on the index, so you won't need any special rbac considerations.

Summary indexing - This is an amazing tool for doing exactly that, summarizing the data. For example if you have network logs - you have probably seen that in a given time period when two machines talk to each other, you may find that you have 100s of "connection logs". If your use case is not interested in each of those 100 logs, but is more interested in - did these two IPs talk - (think threat inteligence - did we go to bad site x) than you could create a single summary log that says IP address x talked to IP address y 100 times. You write this data to a summary index. In reality, summary data gets its speed advantages because instead of speeding up the way you look for a needle in the haystack, you shrink the haystack so it is smaller - like in my example it is 1 / 100 smaller than the original index. This is a useful solution if a summary of the logs is good enough for what your analysts are looking for and that may or may not be the case. In the world of threat intel, we often have to look back at network traffic 18 months. We look at the summary data, if we have a hit, the summary data tells us what date the hit was on, but the analysts may have to go look at the unsummarized log for that day to get a better idea of what really happened because summary logs gain their power by being exactly that - a summary.

For RBAC purposes, you can just make your summary index reside on the same index that it was created for. The term summary index implies that you have a special index, but that is not really the case. A summary index can be written to any index it is just a new source and the sourcetype is stash. So if you summarize your data to the same index that the original logs came from, they will have the same rbac rules on them.

Here is a video on how to summarize data
https://youtu.be/mNAAZ3XGSng

Below is a simple spl concept of summarizing palo alto firewall logs
index=pan sourcetype=connections
| stats sum(bytes_in) as bytes_in sum(bytes_out) as bytes_out earliest(_time) as _time count by src_ip, dest_ip
| collect index=pan source="summarized_pan_connections"

You now need to determine how often you are summarizing your logs and set up a saved search to run that query. Once it runs you just query the data with
index=pan source="summarized_pan_connections"

Another option you can have is to schedule search your dashboard panels- this means that each of the panels will run the query one time at some specified time and everyone who comes to the dashobards will get the data that was created during the scheduled search. This is relatively simple to set up, keeps rbac rules, but if having the latest logs included on the dashboard panels is your biggest priority, this one starts to fall apart.

I have given three suggestions, in my environment I have a similar situation as you, large amount of data and looking back long periods of time is slow. We actually run a little mixture of all of it. We accelerate a days worth of data, then in the middle of the night, we summarize yesterdays logs. Then when the users search the dashboard the query is a combination of the accelerated data for today's data, and the summarized data for the previous days data.

Hope this gives you some ideas of a path forward. There will be plenty of things that you need to consider, particularly how "fresh" does the data need to be. Is the summary of the logs good enough, can you have static data in your dashboards that refreshes every day or every hour?

captaincool07 · ‎06-25-2025

@LAME-Creations

For RBAC purposes, you can just make your summary index reside on the same index that it was created for. --> if I do in this way then will it override my original index data? And how can I differentiate between both index and summary logs present in same index? We use source field to get application name in normal index.

In my case user want to see raw data as well and we need all fields to be viewed everytime. Will summary index provides raw data as well?

We have index format in this way - waf_app_<appID> app ID is different for different app teams. And in dashboard I have just given index = waf_app_* in base search. What index can I give now in summary index (same as my original index)

I am very confused here...

tej57 · ‎06-25-2025

@captaincool07, it would still be the same situation for the datamodel summaries also. Considering your RBAC situation, I would not use datamodel summaries if you want to restrict the access for indexes. You'll add a hell lot of workload on your systems for accelerating the summaries and keeping them intact. If you want to just use tstats for faster query, it would be better to use index time field extractions for the new ingestion.

Thanks,
Tejas.

ITWhisperer · ‎06-25-2025

Another thing to consider with summary indexes is idempotent updates, that is, how to avoid double counting; for example, in your instance, if you created summaries for each day in your summary index, what do you do if you get events which arrive late (for whatever reason), how do you make sure they are included in the summary without double-counting the events which have already been summarised. I did a presentation on this a couple of years ago for the B-Sides program. https://github.com/bsidessplunk/2022/blob/main/Summary%20Index%20Idempotency/Bsides%20Spl22%20Summar...

captaincool07 · ‎06-25-2025

@ITWhisperer what if I use data models instead of summary index? Will it serve the same purpose? And how about RBAC control for data models?

tej57 · ‎06-25-2025

Also, you can use concepts like base search and chained search to restrict the number of searches so that less number of searches will consume the resources. And try to optimize the search to get faster results.

livehybrid · ‎06-25-2025

HI @captaincool07

Summary indexing does not natively preserve RBAC at the original index level. If you aggregate data from multiple indexes into a single summary index, users with access to the summary index can see all summarised data, regardless of their original index permissions. This can break your RBAC model if not carefully managed.

Index permissions and App permissions are managed separately, so you do have some granular control over who can access the summary.

There are different ways to summarise data with Splunk - Check out https://help.splunk.com/en/splunk-enterprise/manage-knowledge-objects/knowledge-management-manual/9.... for more info on how to achieve this.

It does sound like this would be a good candidate for summary indexing, assuming you have already looked to improve the performance of the search (e.g. can you use tstats to search the data, TERM(someString) etc)

🌟 Did this answer help you? If so, please consider:

Adding karma to show it was useful
Marking it as the solution if it resolved your issue
Commenting if you need any clarification

Your feedback encourages the volunteers in this community to continue contributing

ITWhisperer · ‎06-25-2025

You can have multiple "summary" indexes - perhaps one for each "primary" index and then apply RBAC to those summary indexes as well.

captaincool07 · ‎06-25-2025

Can I use data models here? Will it serve same as Summary index? What is reliable and best and most importantly aligns with RBAC created?

Summary index or any other alternative which aligns with RBAC

metadata

other

Tech Talk Recap | Mastering Threat Hunting

Observability for AI Applications: Troubleshooting Latency

Splunk AI Assistant for SPL vs. ChatGPT: Which One is Better?

Are you a member of the Splunk Community?