I am getting two very different results when I am using the stats command the sistats command.
I am wanting to create a summary index of the total number of unique devices reporting to Splunk on a daily basis. I ran this simple command to identify how many devices reported yesterday and I received a count of 350.
* | stats dc(host)
I simply added si to the stats command and the first two fields match the total number of reported events, and the remaining columns did not contain a numerical value of 350 and it appeared to be a list of 350 ip addresses.
* | sistats dc(host)
psrsvd_ct_host↕ psrsvd_gc↕ psrsvd_v↕ psrsvd_vm_host
3947443 3947443 1 (a long list of reporting IP devices)
How do I correctly run this command to populate the summary index with the value of 350?
You don't. That is not how you use sistats.
You use sistats by populating the summary index with whatever it is that sistats outputs. You get the value you want back by querying the summary index over the time range you desire and piping that result to stats. So first
sourcetype=mysourcetype | sistats dc(host)
and send that to the summary index "mysummary". Then
index=mysummary | stats dc(host)
This will work.
The reason for this is so that statistics can be aggregated. Suppose you ran
... | stats dc(host) for Monday, and it returned 350. Suppose you then ran it for Tuesday, and it returned 320. You could store 350 in the summary, and retrieve the value for Monday. But now suppose you want the
dc(host) for Monday and Tuesday, combined. You would not be able to get this information from the summary, so you would have to run it again over the raw data. Similarly, if you needed Monday thru Wednesday, Monday thru Thursday, etc., each would need a new run over the raw data.
However, if you run
... | sistats dc(host) (or any other functions with sistats), Splunk will store the right data to allow you to aggregate the results from Monday and Tuesday (or any other combination) without having to re-run over the raw data, as long as you use
... | stats dc(host) to get it back from the summary index.
Well, there are a lot of details there. First of all, each summary job should in general write to a different summary index. If you want to create a summary, then you should do as I said above and send your sistats output to a summary index for that search/job. There is a setting on the jobs page that lets you set the index, though you do need to create the index ahead of time. When you want to view the results, you have to query from that specific summary index.
This is my first serious attempt at populating a summary index.
I created a command that produced the results I wanted (stats) and then simply added si to stats to populate the index. The visible output between the stat and sistat command are significantly different and the following command produces zero results for all time.
index=summary | stats dc(host)
I know I am doing something wrong, I just don't know what.
Also, since you raised the point, it would be nice if I could get the output of sistat into a freshly created index I named gc_splunk-statistics.