Knowledge Management

In a summary index, how can I preserve/capture the original source / sourcetype / host from the event?

Motivator

I've searched all over and haven't found an answer to this one. My summary index has a subset of events from another index which I collect every 5 minutes. I see the _raw events in the index which is great, but how can I store the original host / source / source type fields in the summary? I've tried the eval command to store the host value in a new field, but it doesn't show up in my summary index. What gives?. I don't want to go back to shell commands and grep 🙂

Example search populating the summary:

index="other" | head 3 | eval orig_host=host | fields orig_host host _raw

Thanks,

Rob

0 Karma

Motivator

after further testing, this is my favorite solution

Just add the following after your base search and orig_host, orig_sourcetype, orig_source and orig_index will all be in your summary index :-)

 | rename _raw as orig_raw

Motivator
# a much simpler solution that I got from Splunk guru "D" :-)
# turns out renaming the _raw field corrects the issue of missing some of the "orig" fields, i.e. orig_sourcetype
# this approach is proabaly not as relavant to Splunk 6 which has many automatic acceleration features
# note: the "| collect " command is optional not needed if you are using the summary index checkbox in a saved search
index=other | rename _time as time | rename _raw as raw | stats count by time raw index host sourcetype source | collect index=collect

Motivator
# I was having trouble recording the raw event, original host, sourcetype and source fields and putting them into a summary index as they were always overridden with the values of the host which runs the search populating the summary index - here's one solution

# step 1 - populate summary index
# search events from an index namded "other" and prepend the _time, host, sourcetype and source fields to the _raw field with "|" as a delimeter and put into a summary index named "collect"
index=other | eval _raw=_time+"|"+host+"|"+sourcetype+"|"+source+"|"+_raw | collect index=collect

# step 2 - read from summary index named "collect"
# extract time, host, sourcetype and source fields that are stashed in the _raw field in the summary index named "collect"
index=collect | rex "^(?<time1>[^|]+)\|(?<host1>[^|]+)\|(?<sourcetype1>[^|]+)\|(?<source1>[^|]+)\|(?<raw1>[^|]+)"

Splunk Employee
Splunk Employee

The "collect" summary indexing operation should handle host-> orig_host, and index-> orig_index, but may not do so for source. Personally, I would use a different summarizing search, calling out different fields other than _raw, etc. What's happening when you search those summarized events is that the default field extractions are being applied, and the host is where the summary ran, the index field is the summary index itself, and the _raw is the base summarized event.

Try calling out the fields you really want to summarize. Note that collect may not properly remap sourcetype -> orig_sourcetype, and will probably ignore eventtype as well.

But also, why are you just cherry-picking events without actually doing any summarization? The search against the raw indexed events should handle that without issue.

0 Karma

Motivator

Ok. I couldn't get the collect command to preserve the orighost, origsourceype, origsource, etc, however the sistats and sitimechart commands seem to preserve orighost which you could then send to the collect command

0 Karma

Motivator

Thanks, I'll give that a try. I wanting to later do matching of text in the raw field with various other searches against the summary. I'm initially trying to populate the summary with a subset of raw events that I want to search against.

0 Karma