Solved: Different data acceleration methods

Thuan · ‎12-12-2017

Hello

There are 3 methods for data acceleration, each method has specific constraints

1). Summary indexing - It's similar to report acceleration in that it involves populating a data summary with the results of a search, but in this case the data summary is actually a special summary index that is built and stored on the search head. Use summary indexing for increased reporting efficiency shows you the easy way of setting up summary indexes, with scheduled searches that use si- commands. Configure summary indexes covers the tricky and difficult method of summary index setup with addinfo, collect, and overlap commands.

2). Report acceleration - Only searches that utilize transforming commands are eligible. In addition, any commands used in the search before the transforming command must be streaming commands.

3). Data model acceleration - As we also use ES.

My question is whether the two previous methods of searching can equivalently be replaced by data model acceleration? The constraint that we have is to ensure the data items used for the searches all have to be CIM-compliant. However, TSIDX data only applies to index time data fields and NOT search time data fields. Are there other important considerations to be aware of?

DalJeanis · ‎12-12-2017

There are a few misconceptions in your description. The differences are covered on this page ...

http://docs.splunk.com/Documentation/Splunk/7.0.1/Knowledge/Acceleratedatamodels

1) Summary indexing chunks up the data so that the individual events are no longer relevant. The purpose of summary indexing is to pre-analyze the data. You CAN use search-time fields in the search that creates the summary index... but the summary index is thereafter a thing by itself, and only the fields that you have provided for will be available. You can think of it as a distant cousin of an OLAP cube... any dimension that isn't part of the design, just isn't there.

2) If a replacement was equivalent, they wouldn't leave the other method in the tool. Each of the methods is a basket of advantages and disadvantages, requirements and prohibitions.
A) Summary indexing is completely independent of the underlying events. You can summarize any set or subset of the data. The summary index can be retained or deleted independently of the underlying events... for instance, you can keep 60 days of actual events online in one index, but ten years of data online in the summary index.
B) The .tsidx files that make up a high-performance analytics store for a single data model are always distributed across one or more of your indexers. This is because Splunk software creates .tsidx files on the indexer, parallel to the buckets that contain the events referenced in the file and which cover the range of time that the summary spans. Implicitly, you should note that the summary goes away when the events go away, if not before.
C) Report acceleration at the per report level only accelerates the data in that report. Thus, it is only taking up the space needed for that report's data. Data model acceleration would implicitly take up space and time for all the data in the data model, assuming that report acceleration was going to give it any benefit.

D) A Data Model an only be accelerated if it contains at least one root event hierarchy or one root search hierarchy that only includes streaming commands. Note that this conflicts with the requirements for report acceleration.

3) Data model acceleration is not guaranteed to occur. Whether or not the system chunks up the data depends on factors like the number of relevant events in the hot buckets. Data model acceleration also may summarize the same data over different timespans, and may create a large number of files, depending on the spans chosen.

CIM compliance is a matter of maintaining standards with regard to identifying data contents and provenance. Any of the above methods could be used in a way that would be non-compliant, or in a way that would be compliant. As such, you should pay primary attention to the contextual benefits of each of the methods for your use case.

Finally, please do not forget the option of "NOT ACCELERATED". This is often the correct way to go.

View solution in original post

DalJeanis · ‎12-12-2017

There are a few misconceptions in your description. The differences are covered on this page ...

http://docs.splunk.com/Documentation/Splunk/7.0.1/Knowledge/Acceleratedatamodels

1) Summary indexing chunks up the data so that the individual events are no longer relevant. The purpose of summary indexing is to pre-analyze the data. You CAN use search-time fields in the search that creates the summary index... but the summary index is thereafter a thing by itself, and only the fields that you have provided for will be available. You can think of it as a distant cousin of an OLAP cube... any dimension that isn't part of the design, just isn't there.

2) If a replacement was equivalent, they wouldn't leave the other method in the tool. Each of the methods is a basket of advantages and disadvantages, requirements and prohibitions.
A) Summary indexing is completely independent of the underlying events. You can summarize any set or subset of the data. The summary index can be retained or deleted independently of the underlying events... for instance, you can keep 60 days of actual events online in one index, but ten years of data online in the summary index.
B) The .tsidx files that make up a high-performance analytics store for a single data model are always distributed across one or more of your indexers. This is because Splunk software creates .tsidx files on the indexer, parallel to the buckets that contain the events referenced in the file and which cover the range of time that the summary spans. Implicitly, you should note that the summary goes away when the events go away, if not before.
C) Report acceleration at the per report level only accelerates the data in that report. Thus, it is only taking up the space needed for that report's data. Data model acceleration would implicitly take up space and time for all the data in the data model, assuming that report acceleration was going to give it any benefit.

D) A Data Model an only be accelerated if it contains at least one root event hierarchy or one root search hierarchy that only includes streaming commands. Note that this conflicts with the requirements for report acceleration.

3) Data model acceleration is not guaranteed to occur. Whether or not the system chunks up the data depends on factors like the number of relevant events in the hot buckets. Data model acceleration also may summarize the same data over different timespans, and may create a large number of files, depending on the spans chosen.

CIM compliance is a matter of maintaining standards with regard to identifying data contents and provenance. Any of the above methods could be used in a way that would be non-compliant, or in a way that would be compliant. As such, you should pay primary attention to the contextual benefits of each of the methods for your use case.

Finally, please do not forget the option of "NOT ACCELERATED". This is often the correct way to go.

Different data acceleration methods

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Quantify Your Splunk Investment Impact: Introducing Savings Metrics to Value Insights

Event Series: Telemetry Pipeline Management

Kick the Tires Before You Commit: A Hands-On Tour of the Splunk Observability Cloud ...

Join the Conversation