We have a number of scheduled searches that run every few minutes to search for events recently indexed that match certain criteria (e.g. events submitted by security devices). These events are enriched with data from threat intel feeds and then passed to a macro that uses the collect command to aggregate the events in a summary index called alert_events. Most of the events that pass through this process come out fine, but we've noticed recently that very large events are causing issues. For example, some of the events that a particular scheduled search is alerting on start out with 150 fields extracted at search time, but the event that arrives in alert_events index has only 100 fields, and the rest of the fields from the original event are just missing. If I run the scheduled search without the macro calling collect, I see all 150 fields, but if I apply the macro at the end of the search, the event indexed in alert_events has only 100 fields.
Is there a maximum size (or a maximum number of extracted fields) for events being passed to collect? I can't find any such limit documented on Splunk Docs.
I am also open to other explanations for why the results of a given search show 150 fields, and applying |collect index=alert_events sourcetype=ouralerts source=ouralerts results in indexed events with only 100 fields. Thanks!
It turns out that the issue we were experiencing was not related to a limit at all. Rather, the source event is JSON formatted, and I was attempting to add a few fields to the event and pass it to collect. This resulted in an event of mixed format (some JSON, some not), and as a result, many of the JSON nested fields were not parsing appropriately in the collect function. We have changed our approach to create a "whitelist" of fields that are important to the receiving function, and we are manually reformatting the event to  have only those fields before passing the event into collect.
It turns out that the issue we were experiencing was not related to a limit at all. Rather, the source event is JSON formatted, and I was attempting to add a few fields to the event and pass it to collect. This resulted in an event of mixed format (some JSON, some not), and as a result, many of the JSON nested fields were not parsing appropriately in the collect function. We have changed our approach to create a "whitelist" of fields that are important to the receiving function, and we are manually reformatting the event to  have only those fields before passing the event into collect.
 
					
				
		
@elliotproebstel, on a different note, I hope you are aware that changing the sourcetype will cost against your licence, even if you are summarizing data.
http://docs.splunk.com/Documentation/Splunk/latest/SearchReference/Collect#arg-options
Thanks for the warning. We are aware, and the events are infrequent enough that the total volume of data is pretty insignificant compared to our license size.
In case you're changing the sourcetype to enable props/transforms, you can do the same with source stanzas in props: [source::<source name>]
[<spec>]
* This stanza enables properties for a given <spec>.
* A props.conf file can contain multiple stanzas for any number of
  different <spec>.
* Follow this stanza name with any number of the following attribute/value
  pairs, as appropriate for what you want to do.
* If you do not set an attribute for a given <spec>, the default is used.
<spec> can be:
1. <sourcetype>, the source type of an event.
2. host::<host>, where <host> is the host, or host-matching pattern, for an
                 event.
3. source::<source>, where <source> is the source, or source-matching
                     pattern, for an event.
4. rule::<rulename>, where <rulename> is a unique name of a source type
                     classification rule.
5. delayedrule::<rulename>, where <rulename> is a unique name of a delayed
                            source type classification rule.
                            These are only considered as a last resort
                            before generating a new source type based on the
                            source seen.
 
					
				
		
Check maxcols in your limits.conf file.  If it's 100, that may be the issue.  
Also check @micahkemp's suggestion, which is a good one, although I'd probably try table rather than fields.  
Would this be in the limits.conf of the search head running the search? I ran this on my search head: /opt/splunk/bin/splunk btool limits list | grep maxcols and I see maxcols = 512.
I also tried | fields * and | table *, but I got the same result.
If you can get by with creating dummy data on this instance, try this to see what makes it to your summary index:
| makeresults 
| eval 
    [| makeresults count=150 
    | streamstats count 
    | eval field="field".count, value="value".count 
    | eval set_it=field."=\"".value."\"" 
    | table set_it 
    | mvcombine set_it 
    | eval search=" ".mvjoin(set_it, ", ")] 
| collect index=<your summary index>
Have you tried adding | fields * in the search prior to | collect?
Would this be in the limits.conf of the search head running the search? I ran this on my search head: /opt/splunk/bin/splunk btool limits list | grep maxcols and I see maxcols = 512.
I also tried | fields * and | table *, but I got the same result.
