Some of the relevant documentation and rationale for what I've tried: https://docs.splunk.com/Documentation/Splunk/9.0.3/Data/Setthesegmentationforeventdata Index-time segmentation
The SEGMENTATION attribute determines the segmentation type used at index time. Here's the syntax:
[<spec>]
SEGMENTATION = <seg_rule>
[<spec>] can be:
<sourcetype>: A source type in your event data.
host::<host>: A host value in your event data.
source::<source>: A source of your event data.
SEGMENTATION = <seg_rule>
This specifies the type of segmentation to use at index time for [<spec>] events.
<seg_rule>
A segmentation type, or "rule", defined in segmenters.conf
Common settings are inner, outer, none, and full, but the default file contains other predefined segmentation rules as well.
Create your own custom rule by editing $SPLUNK_HOME/etc/system/local/segmenters.conf, as described in "Configure segmentation types". https://docs.splunk.com/Documentation/Splunk/9.0.3/Admin/Segmentersconf [<SegmenterName>] * Name your stanza.
* Follow this stanza name with any number of the following setting/value
pairs.
* If you don't specify a setting/value pair, Splunk will use the default.
MAJOR = <space separated list of breaking characters>
* Set major breakers.
* Major breakers are words, phrases, or terms in your data that are surrounded
by set breaking characters.
* By default, major breakers are set to most characters and blank spaces.
* Typically, major breakers are single characters.
* Note: \s represents a space; \n, a newline; \r, a carriage return; and
\t, a tab.
* Default is [ ] < > ( ) { } | ! ; , ' " * \n \r \s \t & ? + %21 %26 %2526 %3B %7C %20 %2B %3D -- %2520 %5D %5B %3A %0A %2C %28 %29
MINOR = <space separated list of strings>
* Specifies minor breakers.
* In addition to the segments specified by the major breakers, for each minor
breaker found, Splunk indexes the token from the last major breaker to the
current minor breaker and from the last minor breaker to the current minor
breaker.
* Default: / : = @ . - $ # % \\ _ I wrote the custom segmenters.conf stanza to inherit the default values of everything but attribute MINOR and simply appended ~ and the ascii code %7E for ~ at the end. However, this did not segment my data properly at index time.
... View more