- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
when i create a summary index for the speed benefit and to filter results there are two main things i lose.
Each event then(after summary indexing) has a new date of when the summary index was created ...no longer the original event date.
The sourcetype=stash now... instead of the original sourcetype.
Is there anyway around this? a way to Pass this through per event?
apologies if this was cryptic.
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The summary indexing process will use _time
for the event's timestamp if _time
is a field that exists in your results. (As per How does summary indexing handle time?.) But in the normal case of using some stats-like command, you don't often keep the _time
field around so the summary index process falls back to the time of your search.
If you want to use one of the stats
commands and you want a better time breakdown, you could look at using bucket
command and set span
to something less than the interval of your saved search:
... | bucket _time span=5m | stats avg(thruput) by _time host
(You may also find sitimechart
helpful here, but I've generally avoided all the si*
helper commands and handled the funky statistical corner cases myself rather than let splunk do it. I've seen some of the si*
command produce more "summary" events than I had input events... which is a step backwards!)
With bucket
or (si)?timechart
, you will still not have the exact _time
of the original event, but that's rather central to how summary indexing works. I suppose you could do a | stats min(_time) as _time by field
but you will still only keep one timestamp from your groups of events... the bottom line is that you can't keep the exact same timestamp of all your events without duplicating all your events, which then defeats the purpose of summary indexing....
In terms of keeping sourcetype
. You can't (or should) do it. In splunk 4.x, the summary indexing process does now set source
to the name of your saved search. You still have a copy of the savedsearch in the event itself called search_name, but searching against source
(since it's one of the primary indexed fields) is really fast. So I would just suggest that you leverage that instead. You still don't have a great drill down option with this, but it's possible. (You can let the sourcetype
field go to your summary index, but it get's renamed orig_sourcetype
which I suppose you could then leverage for drilldown purposes.) I suppose you could make a TRANSFORMS
entry on the stash
sourcetype that would look for orig_sourcetype
in your event and then assign the sourcetype to that value, but that just seems like a bad idea....
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
BTW. It may be more helpful to add to your original question (by using the "edit" feature) rather than using comments.
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The summary indexing process will use _time
for the event's timestamp if _time
is a field that exists in your results. (As per How does summary indexing handle time?.) But in the normal case of using some stats-like command, you don't often keep the _time
field around so the summary index process falls back to the time of your search.
If you want to use one of the stats
commands and you want a better time breakdown, you could look at using bucket
command and set span
to something less than the interval of your saved search:
... | bucket _time span=5m | stats avg(thruput) by _time host
(You may also find sitimechart
helpful here, but I've generally avoided all the si*
helper commands and handled the funky statistical corner cases myself rather than let splunk do it. I've seen some of the si*
command produce more "summary" events than I had input events... which is a step backwards!)
With bucket
or (si)?timechart
, you will still not have the exact _time
of the original event, but that's rather central to how summary indexing works. I suppose you could do a | stats min(_time) as _time by field
but you will still only keep one timestamp from your groups of events... the bottom line is that you can't keep the exact same timestamp of all your events without duplicating all your events, which then defeats the purpose of summary indexing....
In terms of keeping sourcetype
. You can't (or should) do it. In splunk 4.x, the summary indexing process does now set source
to the name of your saved search. You still have a copy of the savedsearch in the event itself called search_name, but searching against source
(since it's one of the primary indexed fields) is really fast. So I would just suggest that you leverage that instead. You still don't have a great drill down option with this, but it's possible. (You can let the sourcetype
field go to your summary index, but it get's renamed orig_sourcetype
which I suppose you could then leverage for drilldown purposes.) I suppose you could make a TRANSFORMS
entry on the stash
sourcetype that would look for orig_sourcetype
in your event and then assign the sourcetype to that value, but that just seems like a bad idea....
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I've generally avoided all the si* helper commands and handled the funky statistical corner cases myself
- Is there a writeup anywhere on what these cases are, or even what the si* commands do?
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Yeah, just use 'orig_sourcetype' if you need it. Similarly, the 'host' is usually set to 'orig_host'.
It is often useful to store min(_time)
and max(_time)
in aggregates (but again only one of each per aggregate) for purposes of weighting values by time intervals, where events are less regular than bucketed time spans.
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
to extend a bit on that... the idea was since the summary index had an aggregate(stats values) distinct showing of values i could select on... i could drill into a list of events with that field=value in them.
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
High level goal: I want to report(dashboard/charts/tables) on a specific bunch of fields extracted (used nasty regex) from a fairly sizable index. The idea was that a summary index pulling only the fields i need would be smarter to dashboard off of...
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Yeah, it's a bit cryptic. More details would be helpful. It sounds like summary indexing is working the way it was intended to. If you provide more details about what you are trying to do it would be helpful. It could be that summary indexing isn't the best fit for your usage case. What level of event reduction are you able to achieve? (What's the ratio of input events equals to summary events?)
