Knowledge Management

Summary index for large data and many group bys

tjago11
Communicator

I'm hoping to get a single summary index query that I can then use to pull data in different ways. I would prefer to roll the data up daily but there are about 150 million events in a day. Normally that wouldn't be an issue but I'm also wanting to group the data by lots of different fields like this:

index=cif
| fields ApplicationName, DataCenter, Environment, ServerType, host, ErrorCode, MessageText, _time
| eval dateOnly = strftime(_time, "%x") 
| fields dateOnly, ApplicationName, DataCenter, Environment, ServerType, host, ErrorCode, MessageText
| fillnull value=""
| stats count as messageCount by dateOnly, ApplicationName, DataCenter, Environment, ServerType, host, ErrorCode, MessageText

The goal is to count the number of times a particular message occurs. On the backside, when this summary is done the user would select the data back like this:

index=summary source=mySource ApplicationName=foo DataCenter=foo Environment=bar ServerType=bar host=*
| stats count as by dateOnly

On retrieval the user will know the various filter fields which is a much smaller set of data. So if I group by the filter fields when building the summary index then I can use them to filter later. I like that this gets me a single summary index job but the query takes like 2.5 hours to complete.

Am I better off running more summary jobs and filtering up front?? Will mean more Summary Index sources and more jobs, which is annoying but maybe necessary?? Thanks.

0 Karma
1 Solution

Vijeta
Influencer

What is the frequency of your summary report? If its daily you can schedule it twice a day for a window of 12 hours or may be every hour depending on the data.
Also based on ApplicationName you can create separate summary reports collecting data into same summary index and different source name(source name will be named of your scheduled report), and later when you search in the query you can use index and source name in your query for particular application.

Thanks
Vijeta

View solution in original post

0 Karma

Vijeta
Influencer

What is the frequency of your summary report? If its daily you can schedule it twice a day for a window of 12 hours or may be every hour depending on the data.
Also based on ApplicationName you can create separate summary reports collecting data into same summary index and different source name(source name will be named of your scheduled report), and later when you search in the query you can use index and source name in your query for particular application.

Thanks
Vijeta

0 Karma

tjago11
Communicator

I think your suggestion to split up the data by application and run separate jobs and separate sources is a good option. Thanks.

0 Karma
Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

Index This | What travels the world but is also stuck in place?

April 2026 Edition  Hayyy Splunk Education Enthusiasts and the Eternally Curious!   We’re back with this ...

Discover New Use Cases: Unlock Greater Value from Your Existing Splunk Data

Realizing the full potential of your Splunk investment requires more than just understanding current usage; it ...

Continue Your Journey: Join Session 2 of the Data Management and Federation Bootcamp ...

As data volumes continue to grow and environments become more distributed, managing and optimizing data ...