Knowledge Management

## Can I use "sistats median(x)" to build a histogram of x?

Super Champion

If I have a summary indexing search like this:

``````.... | sistats median(x)
``````

I get a list of values and counts in a field called `psrsvd_rd_x`, that contains values like this:

``````0e+00:9;3e+00:1;4e+00:1;6e+00:9;...
``````

Which seems to be a semicolon-separated list of values and counts (which are separated by a colon). So the value "0" occurs 9 times, "3" and "4" both occurred once, "6" occurred 9 times, ...

So I'm wondering if I can use this information to to build a histogram of the values of x? It seems like this should be possible since splunk seems to be storing counts of my distinct values anyway (which seems like the very definition of a histogram). So this should be possible, in theory anyways.

Has anyone been able to do this? I've tried a few searches but haven't had any success so far. Are there any gotchas with the way the `sistats` command summarizes this information that would cause trouble if I tried to graph this as a histogram? (In this particular case, the possible distinct values for x is fairly small; there are probably less than 50 distinct values for any given summarized period.)

Yes, I know I could be make a second summary index generating search that stores of counts by value; but I already have a summary index search that calculates `median(x)`, so I was thinking I could leverage the events that were already in my summary index.

Tags (3) Splunk Employee

`sistats` automatically stores the "minimum statistics" required to be able to create aggregates of the function you're specifying, e.g., you may store the `median(x)` hourly, but `sistats` will store enough to be able compute `median(x)` daily or weekly or whatever. In the case of `median()` and any percentile function, this is the same info (so you could in fact get `perc95()`, `perc5()`, etc., out of the data that was generated only using `median()`. This also happens to be almost the same information as `distinct_count(x)`, with the difference that the percentile functions can assume numeric data, and can (and will) thus compress the representation and discard precision to save space, while `dc` won't.

However, I don't believe that the built-in `stats` or other functions will allow you to enumerate out the values and counts that are stored by `sistats` that way. State of Splunk Careers