In Summary index , how to get the original host field?


in my summary index data how to get the original host field data?

after further testing, this is my favorite solution

Just add the following after your base search and orig_host, orig_sourcetype, orig_source and orig_index will all be in your summary index :-)

 | rename _raw as orig_raw

This solves all my "why the hell isn't it adding this field no matter what I do" related problems.

My previous workaround was doing a makeresults and doing an addcols to it, which is of course terrible.

Why does Splunk behave like this by default?

You sir deserve to be on the front page. I am using a summary index, really just another index, to power reports and sub searches for specific users who have high resource demands and thus the need for isolating specific data. Unfortunately they need the entirety of the raw log with all fields. Short of extensive props and transforms along with huge lookup tables using summary indexing was the best option. You saved me so much time! Thank you.

# a much simpler solution that I got from Splunk guru "D" :-)
# turns out renaming the _raw field corrects the issue of missing some of the "orig" fields, i.e. orig_sourcetype
# this approach is proabaly not as relavant to Splunk 6 which has many automatic acceleration features
# note: the "| collect " command is optional not needed if you are using the summary index checkbox in a saved search
index=other | rename _time as time | rename _raw as raw | stats count by time raw index host sourcetype source | collect index=collect
# I was having trouble recording the raw event, original host, sourcetype and source fields and putting them into a summary index as they were always overridden with the values of the host which runs the search populating the summary index - here's one solution

# step 1 - populate summary index
# search events from an index namded "other" and prepend the _time, host, sourcetype and source fields to the _raw field with "|" as a delimeter and put into a summary index named "collect"
index=other | eval _raw=_time+"|"+host+"|"+sourcetype+"|"+source+"|"+_raw | collect index=collect

# step 2 - read from summary index named "collect"
# extract time, host, sourcetype and source fields that are stashed in the _raw field in the summary index named "collect"
index=collect | rex "^(?<time1>[^|]+)\|(?<host1>[^|]+)\|(?<sourcetype1>[^|]+)\|(?<source1>[^|]+)\|(?<raw1>[^|]+)"
The only way to get the original host field is to save it as part of the "populating search." So, if you were running a search every hour that looked like this

sourcetype=access_combined | sistats count by productId status

Change it to

sourcetype=access_combined | sistats count by productId status host

Or did you have a different problem?

Having the same issue. How do we store the original host as a field when we are not doing stats/sistats command? Want to associate the original host with the _raw field in the summary. Likewise my summary index only has a subset of events that I've filtered from a sourcetype.

My saved search is just doing sourcetype=acccess_combined for example

No "si" commands used.I am just filtering the data so i can generate report on small set of data on request basis.

