I've tried a number of ways, and I don't seem to be able to use tscollect effectively while maintaining a _time component.
Here is my tscollect:
... | bucket _time span=1d | stats [many different things] by transactionid, _time | fields - transactionid | tscollect keepresults=t namespace=mynamespace
Here is my tstats:
| tstats values(onestat) as onestat sum(anotherstat) as anotherstat from mynamespace groupby _time [span=1d]
This just returned all of the results in one timeslot. I've also tried to mimic one of the examples from the docs:
| tstats prestats=t values(onestat) as onestat sum(anotherstat) as anotherstat from mynamespace by _time [span=1d] | timechart count
The latter only confirms that the tstats only returns one result. The local disk also confirms that there's only a single time entry:
[root@splunksearch1 mynamespace]# ls -lh
total 18M
-rw------- 1 root root 18M Aug 3 21:36 1407049200-1407049200-18430497569978505115.tsidx
-rw------- 1 root root 86 Aug 3 21:36 splunk-autogen-params.dat
Can anyone offer any recommendations for how I can get tscollect to store the event time?
It turns out my root cause was a lookup table I had in line had a leftover _time field not removed, which was overwriting the _time of the event. In essence, the above works perfectly, so long as you're not sabotaging yourself.
It turns out my root cause was a lookup table I had in line had a leftover _time field not removed, which was overwriting the _time of the event. In essence, the above works perfectly, so long as you're not sabotaging yourself.
You are using tscollect
and tstats
incorrectly. They are not meant to be used as collect
/summaryindex
and stats
, which is what it appears you are trying to do. The summary indexing backfill scripts will not work with them either (for different reasons).
You could use:
... | fields <fields you are interested in> transactionid _time | tscollect namespace=mynamespace
But then you will be responsible for backfill and missed data yourself. (As mentioned the summary backfill will not work with tscollect
as it does with collect
.) So really you should create a data model that contains all the fields you might be interested in working with and accelerate that data model instead. You can use then use tstats against the accelerated data model.
But the way you're using it, you're sort of defeating one of the main points of tscollect
/tstats
and that is to keep data in full fidelity, and to be able to therefore run any stats over it without specifying it ahead of time. You can do this I guess. But then I'd recommend that you at least just do as little aggregation on the fields as possible so that you can still do aggregations afterwards.
tscollect can collect from stats. It just wasn't designed for it, and backfilling is usually a disaster especially if you have more than one indexer.
If you must, your problem has nothing to do with tscollect
, but because you're using stats
and omitting _time
on the "by" clause, so there's no _time
being passed to tscollect
in the first place. Just because you're running a specific range a day at a time, you must still include _time
. (This is one of those things that the backfill scripts and addinfo scripts do for you with collect
that tscollect
does not handle.)
It looks like what you're saying is that tscollect cannot receive the output of a stats command. Is that correct? The challenge with this data source (and why I originally failed using data models) is that a handful of the fields are in the starting event, and a handful in the ending event. Without using a stats (or transaction, etc.), I was having to store the transactionid and two events, so the default count the data model put in was inaccurate. tscollect was an attempt to work around that limitation. What is the right way to leverage acceleration here? (Search DM of Events DM?)