Splunk Search

Event Field Data Lost When Using Collect to Populate Summary Index

EStallcup
Path Finder

I'm having a bit of trouble trying to backfill a couple days in my summary index from a query using the collect command. Although events are returned from the query, and placed in the summary index, but for some reason splunk isn't recognizing any of the fields that are already applied to that source_type (even though when you summary index data, it's saved as a sourcetype of stash). If I populate the summary index from a saved search, everything is fine, it's just not when I execute a search (from the search app, for instance), and use collect to save the data to it.

Here's an example of the same query, trying to backfill data from my _raw index for October 1st-2nd:

index="cdn_download_logs" resource_relative_uri="*.exe" OR resource_relative_uri="*.msi" OR resource_relative_uri="*.dmg" 
earliest=10/01/2012:00:00:00 latest=10/02/2012:00:00:00
| eval lastFileByte=filesize-1 
| eval endByteInt=if(endByte>0,toNumber(endByte,10), lastFileByte) 
| eval startByteInt=if(startByte>0, toNumber(startByte,10), 0) 
| eval leftToSend=((endByteInt-startByteInt)-sc_bytes) 
| eval downloadStatus=if(endByteInt=lastFileByte AND leftToSend<=0 ,"SUCCESS", "FAILURE") 
| search downloadStatus="SUCCESS"
| collect index="summary_download_success_events"

Is there a subtle nuance that I'm missing that is causing my field extractions to not get applied? The weirdest part is data that is returned from a saved search and added to a summary index works perfectly. I'm not sure if the eval arguments in my example query above are causing some unwanted behavior (although, this is the exact same query I have running in the scheduled search that works).

I also know there's a python script somewhere in the Splunk directory that is written to assist in backfilling summary index data. Is this a better option? If so, why is it a better option?

Any help/feedback is much appreciated.

1 Solution

the_wolverine
Champion

You could have Splunk apply the sourcetype of your choosing to your summarized event by adding sourcetype=foo to the collect command:

  • | collect index=summary sourcetype=foo

View solution in original post

davidwyss
New Member

I was querying the data in fast mode, and the fields did not show.

FIX: Changed to verbose mode and chose list format for the display.

I then discovered they were actually there the whole time.

0 Karma

the_wolverine
Champion

You could have Splunk apply the sourcetype of your choosing to your summarized event by adding sourcetype=foo to the collect command:

  • | collect index=summary sourcetype=foo

chanst2
Path Finder

actually, after adding any sourcetype to the collect command, all fields can be recognized now
thanks

0 Karma

EStallcup
Path Finder

It turns out that the fields are lost because my field extractions are applied to the sourcetype given to the _raw indexed data (which I pull from to build the summary index). This is weird because when the scheduled search I described runs to backfill the summary index for each day, the fields were not lost. However, if I run fill_summary_index.py, or run a query in the search app using | collect index="summaryIndex", the fields are.

To fix this, I had to recreate my field extraction regex's to also be applied to a sourcetype of "stash"

laserval
Communicator

I'm experiencing the same behaviour in Splunk 6: when using collect(), only _raw is included in the summary index.

BradL
Path Finder

+1 - I'm having the same experience where collect is dropping everything except raw and the resulting output cannot be queried - including field=value fields that I expect to be extracted by default.

Has a fix been found for this?

0 Karma

BradL
Path Finder

Discovered that the fields are extracted, if they are comma delimited:

f1=v1,f2=v2,etc...

whitespace didn't work.

EStallcup
Path Finder

That is also what I would assume, however, splunk does not seem to be behaving this way.

0 Karma

sowings
Splunk Employee
Splunk Employee

Wait, what? The sourcetype of stash is specifically for the summary indexed data itself. Typically, when data is summary indexed, the raw events are written as key=value pairs, so Splunk should be extracting them automatically.

0 Karma

EStallcup
Path Finder

The problem I've been having with the backfill script, is I cannot get it to parse earliest/latest time parameters that are actual UTC dates as opposed to dynamic dates like '5d@d'. I get an error when I try and run a command like:

.\splunk cmd python fill_summary_index.py -app search -name "SummaryIndex_DownloadSuccessEvents" -owner admin -et "10/01/2012:00:00:00" -lt "10/02/2012:00:00:00" -dedup true -auth admin:password

because it claims: "Failed to get list of scheduled times for saved search". I think this is because the scheduled search uses -et -day@day -lt @day

0 Karma

EStallcup
Path Finder

Yes they are being applied by sourcetype. The weird aspect of that element of the issue is that the sourcetype applied to summary indexed data (stash) is also applied in the scheduled search, which adds indexed data to the summary index with the original fields in place. I originally thought the sourcetype was the issue

0 Karma

sowings
Splunk Employee
Splunk Employee

The python script is 'fill_summary_index.py' and can be used to backfill summary data. It iterates through the scheduled runtimes of the searches you name to run them as though they were being run at that historical time. If you've added a new summary indexing search, and want to have data available historically, you can use this script.

0 Karma

sowings
Splunk Employee
Splunk Employee

Consider a fields command to select exactly which fields you wish to pull to carry into the summary data.

0 Karma

Ayn
Legend

Are your field extractions tied to sourcetype? If so, did you check which sourcetype you're getting for the events when you've collected them to a summary index?

0 Karma
Get Updates on the Splunk Community!

Routing logs with Splunk OTel Collector for Kubernetes

The Splunk Distribution of the OpenTelemetry (OTel) Collector is a product that provides a way to ingest ...

Welcome to the Splunk Community!

(view in My Videos) We're so glad you're here! The Splunk Community is place to connect, learn, give back, and ...

Tech Talk | Elevating Digital Service Excellence: The Synergy of Splunk RUM & APM

Elevating Digital Service Excellence: The Synergy of Real User Monitoring and Application Performance ...