Splunk Search

Maintaining _time with tscollect and tstats

David
Splunk Employee
Splunk Employee

I've tried a number of ways, and I don't seem to be able to use tscollect effectively while maintaining a _time component.

Here is my tscollect:

... | bucket _time span=1d | stats [many different things] by transactionid, _time | fields - transactionid | tscollect keepresults=t namespace=mynamespace

Here is my tstats:

| tstats values(onestat) as onestat sum(anotherstat) as anotherstat from mynamespace groupby _time [span=1d]

This just returned all of the results in one timeslot. I've also tried to mimic one of the examples from the docs:

| tstats prestats=t values(onestat) as onestat sum(anotherstat) as anotherstat from mynamespace by _time [span=1d] | timechart count 

The latter only confirms that the tstats only returns one result. The local disk also confirms that there's only a single time entry:

[root@splunksearch1 mynamespace]# ls -lh
total 18M
-rw------- 1 root root 18M Aug  3 21:36 1407049200-1407049200-18430497569978505115.tsidx
-rw------- 1 root root  86 Aug  3 21:36 splunk-autogen-params.dat

Can anyone offer any recommendations for how I can get tscollect to store the event time?

Tags (2)
0 Karma
1 Solution

David
Splunk Employee
Splunk Employee

It turns out my root cause was a lookup table I had in line had a leftover _time field not removed, which was overwriting the _time of the event. In essence, the above works perfectly, so long as you're not sabotaging yourself.

View solution in original post

David
Splunk Employee
Splunk Employee

It turns out my root cause was a lookup table I had in line had a leftover _time field not removed, which was overwriting the _time of the event. In essence, the above works perfectly, so long as you're not sabotaging yourself.

gkanapathy
Splunk Employee
Splunk Employee

You are using tscollect and tstats incorrectly. They are not meant to be used as collect/summaryindex and stats, which is what it appears you are trying to do. The summary indexing backfill scripts will not work with them either (for different reasons).

You could use:

... | fields <fields you are interested in> transactionid _time | tscollect namespace=mynamespace

But then you will be responsible for backfill and missed data yourself. (As mentioned the summary backfill will not work with tscollect as it does with collect.) So really you should create a data model that contains all the fields you might be interested in working with and accelerate that data model instead. You can use then use tstats against the accelerated data model.

gkanapathy
Splunk Employee
Splunk Employee

But the way you're using it, you're sort of defeating one of the main points of tscollect/tstats and that is to keep data in full fidelity, and to be able to therefore run any stats over it without specifying it ahead of time. You can do this I guess. But then I'd recommend that you at least just do as little aggregation on the fields as possible so that you can still do aggregations afterwards.

gkanapathy
Splunk Employee
Splunk Employee

tscollect can collect from stats. It just wasn't designed for it, and backfilling is usually a disaster especially if you have more than one indexer.

If you must, your problem has nothing to do with tscollect, but because you're using stats and omitting _time on the "by" clause, so there's no _time being passed to tscollect in the first place. Just because you're running a specific range a day at a time, you must still include _time. (This is one of those things that the backfill scripts and addinfo scripts do for you with collect that tscollect does not handle.)

David
Splunk Employee
Splunk Employee

It looks like what you're saying is that tscollect cannot receive the output of a stats command. Is that correct? The challenge with this data source (and why I originally failed using data models) is that a handful of the fields are in the starting event, and a handful in the ending event. Without using a stats (or transaction, etc.), I was having to store the transactionid and two events, so the default count the data model put in was inaccurate. tscollect was an attempt to work around that limitation. What is the right way to leverage acceleration here? (Search DM of Events DM?)

0 Karma
Get Updates on the Splunk Community!

Easily Improve Agent Saturation with the Splunk Add-on for OpenTelemetry Collector

Agent Saturation What and Whys In application performance monitoring, saturation is defined as the total load ...

Explore the Latest Educational Offerings from Splunk [January 2025 Updates]

At Splunk Education, we are committed to providing a robust learning experience for all users, regardless of ...

Developer Spotlight with Paul Stout

Welcome to our very first developer spotlight release series where we'll feature some awesome Splunk ...