The architecting Splunk 7.1 Enterprise Deployments class empathizes that setting annotate_punct = false
in props.conf
at indexer-level can improve significantly the indexing time.
I wonder why setting it like this can improve indexing time and in which cases we should keep the punctuations field.
Our sales engineer said -
PUNCT is exactly like it sounds; it’s an index-time field containing an ordered list of punctuations in an event. This is extremely useful for finding “patterns” of events; like a windows event where the service name and IP address would change but the event structure would remain the same.
It’s used in the background by Splunk sometimes. Very useful for eventtype, tagging, etc.
ANNOTATE_PUNCT in particular is a toggling switch for this setting. It’s on by default, but if you have;
1. Extremely long events
2. Extremely frequent events
3. Events all of the same PUNCT pattern
4. Events of all different PUNCT patterns
Than turning it off will reduce indexer CPU load on the parsing queue in the indexing pipeline.
It seems to me that most log files would fall under the 3 category - Events all of the same PUNCT pattern.