We have a number of scheduled searches that run every few minutes to search for events recently indexed that match certain criteria (e.g. events submitted by security devices). These events are enriched with data from threat intel feeds and then passed to a macro that uses the collect
command to aggregate the events in a summary index called alert_events
. Most of the events that pass through this process come out fine, but we've noticed recently that very large events are causing issues. For example, some of the events that a particular scheduled search is alerting on start out with 150 fields extracted at search time, but the event that arrives in alert_events
index has only 100 fields, and the rest of the fields from the original event are just missing. If I run the scheduled search without the macro calling collect
, I see all 150 fields, but if I apply the macro at the end of the search, the event indexed in alert_events
has only 100 fields.
Is there a maximum size (or a maximum number of extracted fields) for events being passed to collect
? I can't find any such limit documented on Splunk Docs.
I am also open to other explanations for why the results of a given search show 150 fields, and applying |collect index=alert_events sourcetype=ouralerts source=ouralerts
results in indexed events with only 100 fields. Thanks!
It turns out that the issue we were experiencing was not related to a limit at all. Rather, the source event is JSON formatted, and I was attempting to add a few fields to the event and pass it to collect
. This resulted in an event of mixed format (some JSON, some not), and as a result, many of the JSON nested fields were not parsing appropriately in the collect
function. We have changed our approach to create a "whitelist" of fields that are important to the receiving function, and we are manually reformatting the event to have only those fields before passing the event into collect
.
It turns out that the issue we were experiencing was not related to a limit at all. Rather, the source event is JSON formatted, and I was attempting to add a few fields to the event and pass it to collect
. This resulted in an event of mixed format (some JSON, some not), and as a result, many of the JSON nested fields were not parsing appropriately in the collect
function. We have changed our approach to create a "whitelist" of fields that are important to the receiving function, and we are manually reformatting the event to have only those fields before passing the event into collect
.
@elliotproebstel, on a different note, I hope you are aware that changing the sourcetype will cost against your licence, even if you are summarizing data.
http://docs.splunk.com/Documentation/Splunk/latest/SearchReference/Collect#arg-options
Thanks for the warning. We are aware, and the events are infrequent enough that the total volume of data is pretty insignificant compared to our license size.
In case you're changing the sourcetype to enable props/transforms, you can do the same with source stanzas in props: [source::<source name>]
[<spec>]
* This stanza enables properties for a given <spec>.
* A props.conf file can contain multiple stanzas for any number of
different <spec>.
* Follow this stanza name with any number of the following attribute/value
pairs, as appropriate for what you want to do.
* If you do not set an attribute for a given <spec>, the default is used.
<spec> can be:
1. <sourcetype>, the source type of an event.
2. host::<host>, where <host> is the host, or host-matching pattern, for an
event.
3. source::<source>, where <source> is the source, or source-matching
pattern, for an event.
4. rule::<rulename>, where <rulename> is a unique name of a source type
classification rule.
5. delayedrule::<rulename>, where <rulename> is a unique name of a delayed
source type classification rule.
These are only considered as a last resort
before generating a new source type based on the
source seen.
Check maxcols
in your limits.conf file. If it's 100, that may be the issue.
Also check @micahkemp's suggestion, which is a good one, although I'd probably try table
rather than fields
.
Would this be in the limits.conf of the search head running the search? I ran this on my search head: /opt/splunk/bin/splunk btool limits list | grep maxcols
and I see maxcols = 512
.
I also tried | fields *
and | table *
, but I got the same result.
If you can get by with creating dummy data on this instance, try this to see what makes it to your summary index:
| makeresults
| eval
[| makeresults count=150
| streamstats count
| eval field="field".count, value="value".count
| eval set_it=field."=\"".value."\""
| table set_it
| mvcombine set_it
| eval search=" ".mvjoin(set_it, ", ")]
| collect index=<your summary index>
Have you tried adding | fields *
in the search prior to | collect
?
Would this be in the limits.conf of the search head running the search? I ran this on my search head: /opt/splunk/bin/splunk btool limits list | grep maxcols
and I see maxcols = 512
.
I also tried | fields *
and | table *
, but I got the same result.