Knowledge Management

After migrating from tscollect to data model acceleration, how should I handle slow performance with high cardinality splits?

David
Splunk Employee
Splunk Employee

Hello all,

I'm trying to migrate from tscollect to data model acceleration, and running into a challenge. I'm looking at Splunk search logs, and want to find the sum(total_run_time) groupby searchtype. Searchtype is a calculated field based on one log message, and total_run_time is based on another log message. The two log messages are connected by the search_id.

With tscollect, I did this on tsidx creation. With data model acceleration, it seems like I need to do sum(total_run_time) values(searchtype) by search_id, which results in a million result split, which is very slow.

Is there a better way to do this, or should I consider something like summary indexing if I want to avoid tscollect?

1 Solution

tfletcher_splun
Splunk Employee
Splunk Employee

The "proper" thing to do is to make a data model and then run the aggregations on the fly for your searches in tstats. Accelerate the data model and hopefully it will not be too bad.

The other solution is actually to use summary indexing, but to make all the fields you are interested in indexed fields with props and transforms. By making them indexed fields you can still use tstats against the index. In your search query, instead of from datamodel=blah you'd do from index=blah. This still comes with all the problems that tscollect has (data lag causing incorrect data namely), but on the plus side you get the retention policies of indexes and if summary indexing is configured properly it will go out and store the data on your indexers. ,The best you could probably do is to actually do those aggregations on the accelerated data model and not expect them to already be done. That is set up your data model to have the fields and run a tstats on the accelerated model by search_id. It is not pretty, but what you want is only possible in data model acceleration if splunk allowed base search data models to be accelerated which it does not.

The other thing you could do is migrate to summary indexing. You can then make all of the fields you care about in your summary index, indexed fields. The result of this is equivalent to making your own tsidx namespace. You can actually still run tstats against that index. Instead of from datamodel=blah, you do from index=blah. The storage technology is the same and it should work with almost the same performance. However the same problems you would have with tscollect (such as data lag causing incorrect data) still exist, whereas they will not in the accelerated data model with aggregations on the fly.

View solution in original post

tfletcher_splun
Splunk Employee
Splunk Employee

The "proper" thing to do is to make a data model and then run the aggregations on the fly for your searches in tstats. Accelerate the data model and hopefully it will not be too bad.

The other solution is actually to use summary indexing, but to make all the fields you are interested in indexed fields with props and transforms. By making them indexed fields you can still use tstats against the index. In your search query, instead of from datamodel=blah you'd do from index=blah. This still comes with all the problems that tscollect has (data lag causing incorrect data namely), but on the plus side you get the retention policies of indexes and if summary indexing is configured properly it will go out and store the data on your indexers. ,The best you could probably do is to actually do those aggregations on the accelerated data model and not expect them to already be done. That is set up your data model to have the fields and run a tstats on the accelerated model by search_id. It is not pretty, but what you want is only possible in data model acceleration if splunk allowed base search data models to be accelerated which it does not.

The other thing you could do is migrate to summary indexing. You can then make all of the fields you care about in your summary index, indexed fields. The result of this is equivalent to making your own tsidx namespace. You can actually still run tstats against that index. Instead of from datamodel=blah, you do from index=blah. The storage technology is the same and it should work with almost the same performance. However the same problems you would have with tscollect (such as data lag causing incorrect data) still exist, whereas they will not in the accelerated data model with aggregations on the fly.

tfletcher_splun
Splunk Employee
Splunk Employee

sorry about the weirdly duplicated paragraph.
Lesson learned, don't ever click the sign in and post, it has weird behavior which makes you write a new answer and then seems to merge them, sign in first and then post.

David
Splunk Employee
Splunk Employee

Yeah, there are some weird UI quirks with the new answers 😉

0 Karma

piebob
Splunk Employee
Splunk Employee

thanks for the info about the Answers bug, i will look into this!

Get Updates on the Splunk Community!

Splunk Enterprise Security 8.0.2 Availability: On cloud and On-premise!

A few months ago, we released Splunk Enterprise Security 8.0 for our cloud customers. Today, we are excited to ...

Logs to Metrics

Logs and Metrics Logs are generally unstructured text or structured events emitted by applications and written ...

Developer Spotlight with Paul Stout

Welcome to our very first developer spotlight release series where we'll feature some awesome Splunk ...