I posted something to the other question, but since this is a different question, I thought I would ask a new one...
I have a question similar to http://answers.splunk.com/questions/2180/license-usage-by-sourcetype
I want to get a breakdown by not only sourcetype, but by server as well.
Ideally, I want a chart that has:
Sourcetype Hostname Events KB
I would take something that just has events or KB if I can't have both.
I tried something like this:
sourcetype!=stash | eval sourceType_Host=sourcetype . "-" . host | chart count by sourceType_Host
but this takes HOURS to run (I am not sure if it has ever completed). We have a LOT of data on multiple indexers.
Lowell is correct. We intend to fix this problem in 4.2 by recording license usage in a more granular manner. Specifically, we'll capture usage by the tuple (_time, source, sourcetype, host, forwarder) so that you can slice and report by any combination of those fields.
As an aside, are you running your search from the "Advanced Charting" view or CLI? These are many times faster than the default "flashtimeline" view for reporting searches.
Did you guys ever implement this or is it still not supported? Thanks!
Your issue is not a reporting problem, it has to do with how splunk captures metrics fundamentally.
Splunk records indexing metrics on four different axises: source, sourcetype, host, and index. And each of these metrics records specify a series and indexing volume metrics in the form of the following values: kb, eps (events per second), and kbps (kb per second), and from that you can mathematically also determine (1) total number of events, and (2)the length of each metrics snapshot (~30 seconds, but seems to vary slightly).
So, the problem here is that you are trying to capture data across the sourcetype axis and the host axis at the same time; but that's not possible. Each axis is an independent summarization based on that axis alone, and therefore there is no way to combine this information by simply looking at more than one axis at a time. You would have to actually run a wide-open search and then summarize your data, but that will be very very slow.
You could build your own metrics-like information based on a search like this:
index=* OR index=_* | fields source sourcetype host index _indextime | bucket _indextime as time span=30s | eval bytes=len(_raw) | stats sum(eval(bytes/1024)) as kb, count as events by time, host, sourcetype | eval kbps=kb/30 | eval eps=events/30 | where time>relative_time(now(),"-15m@m") AND time<relative_time(now(),"@m") | rename time as _time
Try running this for a 15 minute window and see what happens. You could schedule a search like this to run on a 15 minute interval as a summary indexing saved search.
Keep in mind that this is a fundamentally flawed approach. This is because the index itself is stored in reverse time order, but what we really need to search on here is
_indextime, but there is no fast way to search against this value. So if you load events that are days or moths old, then you would have to search a massive time range on a frequent basis just to find recently indexed events. That said, this should get you pretty close to the correct numbers.