when i create a summary index for the speed benefit and to filter results there are two main things i lose.
Each event then(after summary indexing) has a new date of when the summary index was created ...no longer the original event date.
The sourcetype=stash now... instead of the original sourcetype.
Is there anyway around this? a way to Pass this through per event?
apologies if this was cryptic.
Yeah, it's a bit cryptic. More details would be helpful. It sounds like summary indexing is working the way it was intended to. If you provide more details about what you are trying to do it would be helpful. It could be that summary indexing isn't the best fit for your usage case. What level of event reduction are you able to achieve? (What's the ratio of input events equals to summary events?)
High level goal: I want to report(dashboard/charts/tables) on a specific bunch of fields extracted (used nasty regex) from a fairly sizable index. The idea was that a summary index pulling only the fields i need would be smarter to dashboard off of...
to extend a bit on that... the idea was since the summary index had an aggregate(stats values) distinct showing of values i could select on... i could drill into a list of events with that field=value in them.
The summary indexing process will use
_time for the event's timestamp if
_time is a field that exists in your results. (As per How does summary indexing handle time?.) But in the normal case of using some stats-like command, you don't often keep the
_time field around so the summary index process falls back to the time of your search.
If you want to use one of the
stats commands and you want a better time breakdown, you could look at using
bucket command and set
span to something less than the interval of your saved search:
... | bucket _time span=5m | stats avg(thruput) by _time host
(You may also find
sitimechart helpful here, but I've generally avoided all the
si* helper commands and handled the funky statistical corner cases myself rather than let splunk do it. I've seen some of the
si* command produce more "summary" events than I had input events... which is a step backwards!)
(si)?timechart, you will still not have the exact
_time of the original event, but that's rather central to how summary indexing works. I suppose you could do a
| stats min(_time) as _time by field but you will still only keep one timestamp from your groups of events... the bottom line is that you can't keep the exact same timestamp of all your events without duplicating all your events, which then defeats the purpose of summary indexing....
In terms of keeping
sourcetype. You can't (or should) do it. In splunk 4.x, the summary indexing process does now set
source to the name of your saved search. You still have a copy of the savedsearch in the event itself called search_name, but searching against
source (since it's one of the primary indexed fields) is really fast. So I would just suggest that you leverage that instead. You still don't have a great drill down option with this, but it's possible. (You can let the
sourcetype field go to your summary index, but it get's renamed
orig_sourcetype which I suppose you could then leverage for drilldown purposes.) I suppose you could make a
TRANSFORMS entry on the
stash sourcetype that would look for
orig_sourcetype in your event and then assign the sourcetype to that value, but that just seems like a bad idea....
Yeah, just use 'origsourcetype' if you need it. Similarly, the 'host' is usually set to 'orighost'.
It is often useful to store
max(_time) in aggregates (but again only one of each per aggregate) for purposes of weighting values by time intervals, where events are less regular than bucketed time spans.
I've generally avoided all the si* helper commands and handled the funky statistical corner cases myself - Is there a writeup anywhere on what these cases are, or even what the si* commands do?
BTW. It may be more helpful to add to your original question (by using the "edit" feature) rather than using comments.