About j_genac

j_genac · ‎02-10-2023

Some of the relevant documentation and rationale for what I've tried: https://docs.splunk.com/Documentation/Splunk/9.0.3/Data/Setthesegmentationforeventdata Index-time segmentation The SEGMENTATION attribute determines the segmentation type used at index time. Here's the syntax: [<spec>] SEGMENTATION = <seg_rule> [<spec>] can be: <sourcetype>: A source type in your event data. host::<host>: A host value in your event data. source::<source>: A source of your event data. SEGMENTATION = <seg_rule> This specifies the type of segmentation to use at index time for [<spec>] events. <seg_rule> A segmentation type, or "rule", defined in segmenters.conf Common settings are inner, outer, none, and full, but the default file contains other predefined segmentation rules as well. Create your own custom rule by editing $SPLUNK_HOME/etc/system/local/segmenters.conf, as described in "Configure segmentation types". https://docs.splunk.com/Documentation/Splunk/9.0.3/Admin/Segmentersconf [<SegmenterName>] * Name your stanza. * Follow this stanza name with any number of the following setting/value pairs. * If you don't specify a setting/value pair, Splunk will use the default. MAJOR = <space separated list of breaking characters> * Set major breakers. * Major breakers are words, phrases, or terms in your data that are surrounded by set breaking characters. * By default, major breakers are set to most characters and blank spaces. * Typically, major breakers are single characters. * Note: \s represents a space; \n, a newline; \r, a carriage return; and \t, a tab. * Default is [ ] < > ( ) { } | ! ; , ' " * \n \r \s \t & ? + %21 %26 %2526 %3B %7C %20 %2B %3D -- %2520 %5D %5B %3A %0A %2C %28 %29 MINOR = <space separated list of strings> * Specifies minor breakers. * In addition to the segments specified by the major breakers, for each minor breaker found, Splunk indexes the token from the last major breaker to the current minor breaker and from the last minor breaker to the current minor breaker. * Default: / : = @ . - $ # % \\ _ I wrote the custom segmenters.conf stanza to inherit the default values of everything but attribute MINOR and simply appended ~ and the ascii code %7E for ~ at the end. However, this did not segment my data properly at index time.

j_genac · ‎02-10-2023

I have a dataset that uses some non-segmented character to separate meaningful and commonly-used search terms. Sample events 123,SVCA,ABC123,DEF~AP~SOME_SVC123~1.0,10.0.1.2 ,67e15429-e44c-4c27-bc9a-f3462ae67125,,2023-02-10-12:00:28.578,14,ER40011,"Unauthorized" 123,SVCB,DEF456,DEF~LG~Login~1.0,10.0.1.2,cd63b821-a96c-11ed-8a7c-00000a070dc2,cd63b820-a96c-11ed-8a7c-00000a070dc2,2023-02-10-12:00:28.578,10,0,"OK" 123,SVCC,ZHY789,123~XD-ABC~OtherSvc~2.0,10.0.1.2 ,67e15429-e44c-4c27-bc9a-f3462ae67125,,2023-02-10-12:00:28.566,321,ER00000,"Success" 456,ABC1,,DEFAULT~ENTL~ASvc~1.0,10.0.1.2 ,b70a2c11-286f-44da-9013-854acb1599cd,,2023-02-10-11:59:44.830,14,ER00000,"Success" 456,DEF2,,456~LG~Login~v1.0.0,10.0.0.1,27bee310-a843-11ed-a629-db0c7ca6c807,,2023-02-10-11:59:44.666,300,1,"FAIL" 456,ZHY3,ZHY45678,DEF~AB~ANOTHER_SVC121~1.0,10.0.0.1 ,19b79e9b-e2e2-4ba2-a7cf-e65ba8da5e7b,,2023-02-10-11:58:58.813,,27,ER40011,"Unauthorized" Users will often search for individual items separated by the ~ character. E.g., index=myindex sourcetype=the_above_sourcetype *LG* My purpose is to reduce the need for leading wildcards in most searches here, as this is a high-volume dataset by adding the minor segmentation character '~' at index time. I've tried these props.conf and segmenters.conf without success. Could anyone provide any insight? <indexer> SPLUNK_HOME/etc/apps/myapp/local/props.conf [the_above_sourcetype] SHOULD_LINEMERGE=false LINE_BREAKER=([\r\n]+) TIME_PREFIX = ^([^,]*,){7} TIME_FORMAT = %Y-%m-%d-%H:%M:%S.%3Q TRUNCATE = 10000 MAX_TIMESTAMP_LOOKAHEAD=50 SEGMENTATION = my-custom-segmenter SPLUNK_HOME/etc/apps/myapp/local/segmenters.conf [my-custom-segmenter] MINOR = / : = @ . - $ # % \\ _ ~ %7E Added those and bounced my test instance, but I still cannot search for index=myindex sourcetype=the_above_sourcetype LG -- does not return results such as these, however *LG* as a term does return it. 456,DEF2,,456~LG~Login~v1.0.0,10.0.0.1,27bee310-a843-11ed-a629-db0c7ca6c807,,2023-02-10-11:59:44.666,300,1,"FAIL"

Posts	2
Solutions	0
Karma Given	0
Karma Received	0
Member Since	‎02-10-2023

Online Status	Offline
Date Last Visited	‎04-11-2023 04:24 PM

How to custom index time segments for a unique sou...

Re: Custom index time segments for a unique source...

How to custom index time segments for a unique sou...