I have a heavy search on multiple sources that I want to schedule to populate a summary index. I am basically interested in certain events so I want to populate the summary index with only those events. That way I can run searches on the summary index quickly as opposed to the normal index that contains hundreds of millions of events.
I can populate the summary index like this:
index=windows OR index=linux OR index=something my search | addinfo | collect index=mysummaryindex
This works fine, however the problem is that the host field is not saved so I don't know which host generated the event.
Is there a way to add the host field into the summary index as well? The marker option for collect just adds a certain string field which is not useful in this case.
after further testing, this is my favorite solution
Just add the following after your base search and orig_host, orig_sourcetype, orig_source and orig_index will all be in your summary index :-)
| rename _raw as orig_raw
# a much simpler solution that I got from Splunk guru "D" :-)
# turns out renaming the _raw field corrects the issue of missing some of the "orig" fields, i.e. orig_sourcetype
# this approach is proabaly not as relavant to Splunk 6 which has many automatic acceleration features
# note: the "| collect " command is optional not needed if you are using the summary index checkbox in a saved search
index=other | rename _time as time | rename _raw as raw | stats count by time raw index host sourcetype source | collect index=collect
# I was having trouble recording the raw event, original host, sourcetype and source fields and putting them into a summary index as they were always overridden with the values of the host which runs the search populating the summary index - here's one solution
# step 1 - populate summary index
# search events from an index namded "other" and prepend the _time, host, sourcetype and source fields to the _raw field with "|" as a delimeter and put into a summary index named "collect"
index=other | eval _raw=_time+"|"+host+"|"+sourcetype+"|"+source+"|"+_raw | collect index=collect
# step 2 - read from summary index named "collect"
# extract time, host, sourcetype and source fields that are stashed in the _raw field in the summary index named "collect"
index=collect | rex "^(?<time1>[^|]+)\|(?<host1>[^|]+)\|(?<sourcetype1>[^|]+)\|(?<source1>[^|]+)\|(?<raw1>[^|]+)"
When writing to the summary index Splunk should have created and populated the original host in a new field "orig_host".
Does that not exist?
Oh, yah. I see what you're saying. I was using stat command.
I'm trying to do something similar but, I additionally want to eliminate unwanted fields when I write to summary but, no answer for me so far:
Nope - I don't see the orig_host field at all. I'm not sure if the collect command adds that, or it is only part of sistat/sichart/etc commands.
did you try adding:
| fields host
in you search?
Yes, unfortunately that field does not get stored in the summary index.
It appears that the collect command only stores what's in the marker, and that can be only an arbitrary string.