Solved: Re: Limit on size of event/data passed to 'collect...

elliotproebstel · ‎11-01-2017

We have a number of scheduled searches that run every few minutes to search for events recently indexed that match certain criteria (e.g. events submitted by security devices). These events are enriched with data from threat intel feeds and then passed to a macro that uses the collect command to aggregate the events in a summary index called alert_events. Most of the events that pass through this process come out fine, but we've noticed recently that very large events are causing issues. For example, some of the events that a particular scheduled search is alerting on start out with 150 fields extracted at search time, but the event that arrives in alert_events index has only 100 fields, and the rest of the fields from the original event are just missing. If I run the scheduled search without the macro calling collect, I see all 150 fields, but if I apply the macro at the end of the search, the event indexed in alert_events has only 100 fields.

Is there a maximum size (or a maximum number of extracted fields) for events being passed to collect? I can't find any such limit documented on Splunk Docs.

I am also open to other explanations for why the results of a given search show 150 fields, and applying |collect index=alert_events sourcetype=ouralerts source=ouralerts results in indexed events with only 100 fields. Thanks!

elliotproebstel · ‎12-01-2017

It turns out that the issue we were experiencing was not related to a limit at all. Rather, the source event is JSON formatted, and I was attempting to add a few fields to the event and pass it to collect. This resulted in an event of mixed format (some JSON, some not), and as a result, many of the JSON nested fields were not parsing appropriately in the collect function. We have changed our approach to create a "whitelist" of fields that are important to the receiving function, and we are manually reformatting the event to have only those fields before passing the event into collect.

View solution in original post

elliotproebstel · ‎12-01-2017

It turns out that the issue we were experiencing was not related to a limit at all. Rather, the source event is JSON formatted, and I was attempting to add a few fields to the event and pass it to collect. This resulted in an event of mixed format (some JSON, some not), and as a result, many of the JSON nested fields were not parsing appropriately in the collect function. We have changed our approach to create a "whitelist" of fields that are important to the receiving function, and we are manually reformatting the event to have only those fields before passing the event into collect.

niketn · ‎11-27-2017

@elliotproebstel, on a different note, I hope you are aware that changing the sourcetype will cost against your licence, even if you are summarizing data.

http://docs.splunk.com/Documentation/Splunk/latest/SearchReference/Collect#arg-options

____________________________________________
| makeresults | eval message= "Happy Splunking!!!"

elliotproebstel · ‎11-28-2017

Thanks for the warning. We are aware, and the events are infrequent enough that the total volume of data is pretty insignificant compared to our license size.

micahkemp · ‎11-28-2017

In case you're changing the sourcetype to enable props/transforms, you can do the same with source stanzas in props: [source::<source name>]

[<spec>]
* This stanza enables properties for a given <spec>.
* A props.conf file can contain multiple stanzas for any number of
  different <spec>.
* Follow this stanza name with any number of the following attribute/value
  pairs, as appropriate for what you want to do.
* If you do not set an attribute for a given <spec>, the default is used.

<spec> can be:
1. <sourcetype>, the source type of an event.
2. host::<host>, where <host> is the host, or host-matching pattern, for an
                 event.
3. source::<source>, where <source> is the source, or source-matching
                     pattern, for an event.
4. rule::<rulename>, where <rulename> is a unique name of a source type
                     classification rule.
5. delayedrule::<rulename>, where <rulename> is a unique name of a delayed
                            source type classification rule.
                            These are only considered as a last resort
                            before generating a new source type based on the
                            source seen.

DalJeanis · ‎11-24-2017

Check maxcols in your limits.conf file. If it's 100, that may be the issue.

Also check @micahkemp's suggestion, which is a good one, although I'd probably try table rather than fields.

elliotproebstel · ‎11-27-2017

Would this be in the limits.conf of the search head running the search? I ran this on my search head: /opt/splunk/bin/splunk btool limits list | grep maxcols and I see maxcols = 512.

I also tried | fields * and | table *, but I got the same result.

micahkemp · ‎11-27-2017

If you can get by with creating dummy data on this instance, try this to see what makes it to your summary index:

| makeresults 
| eval 
    [| makeresults count=150 
    | streamstats count 
    | eval field="field".count, value="value".count 
    | eval set_it=field."=\"".value."\"" 
    | table set_it 
    | mvcombine set_it 
    | eval search=" ".mvjoin(set_it, ", ")] 
| collect index=<your summary index>

micahkemp · ‎11-24-2017

Have you tried adding | fields * in the search prior to | collect?

elliotproebstel · ‎11-27-2017

Would this be in the limits.conf of the search head running the search? I ran this on my search head: /opt/splunk/bin/splunk btool limits list | grep maxcols and I see maxcols = 512.

I also tried | fields * and | table *, but I got the same result.

Limit on size of event/data passed to 'collect' command?

What the End of Support for Splunk Add-on Builder Means for You

Solve, Learn, Repeat: New Puzzle Channel Now Live

Building Reliable Asset and Identity Frameworks in Splunk ES

Are you a member of the Splunk Community?

Limit on size of event/data passed to 'collect' command?

What the End of Support for Splunk Add-on Builder Means for You

Solve, Learn, Repeat: New Puzzle Channel Now Live

Building Reliable Asset and Identity Frameworks in Splunk ES