Splunk Enterprise

How to custom index time segments for a unique sourcetype?

j_genac
Observer

I have a dataset that uses some non-segmented character to separate meaningful and commonly-used search terms.

Sample events

 

123,SVCA,ABC123,DEF~AP~SOME_SVC123~1.0,10.0.1.2    ,67e15429-e44c-4c27-bc9a-f3462ae67125,,2023-02-10-12:00:28.578,14,ER40011,"Unauthorized"
123,SVCB,DEF456,DEF~LG~Login~1.0,10.0.1.2,cd63b821-a96c-11ed-8a7c-00000a070dc2,cd63b820-a96c-11ed-8a7c-00000a070dc2,2023-02-10-12:00:28.578,10,0,"OK"
123,SVCC,ZHY789,123~XD-ABC~OtherSvc~2.0,10.0.1.2  ,67e15429-e44c-4c27-bc9a-f3462ae67125,,2023-02-10-12:00:28.566,321,ER00000,"Success"
456,ABC1,,DEFAULT~ENTL~ASvc~1.0,10.0.1.2  ,b70a2c11-286f-44da-9013-854acb1599cd,,2023-02-10-11:59:44.830,14,ER00000,"Success"
456,DEF2,,456~LG~Login~v1.0.0,10.0.0.1,27bee310-a843-11ed-a629-db0c7ca6c807,,2023-02-10-11:59:44.666,300,1,"FAIL"
456,ZHY3,ZHY45678,DEF~AB~ANOTHER_SVC121~1.0,10.0.0.1    ,19b79e9b-e2e2-4ba2-a7cf-e65ba8da5e7b,,2023-02-10-11:58:58.813,,27,ER40011,"Unauthorized"

 

 
Users will often search for individual items separated by the ~ character. E.g.,
index=myindex sourcetype=the_above_sourcetype *LG*


My purpose is to reduce the need for leading wildcards in most searches here, as this is a high-volume dataset by adding the minor segmentation character '~' at index time.

I've tried these props.conf and segmenters.conf without success. Could anyone provide any insight?

<indexer>
SPLUNK_HOME/etc/apps/myapp/local/props.conf

 

[the_above_sourcetype]
SHOULD_LINEMERGE=false
LINE_BREAKER=([\r\n]+)
TIME_PREFIX = ^([^,]*,){7}
TIME_FORMAT = %Y-%m-%d-%H:%M:%S.%3Q
TRUNCATE = 10000
MAX_TIMESTAMP_LOOKAHEAD=50
SEGMENTATION = my-custom-segmenter

 

SPLUNK_HOME/etc/apps/myapp/local/segmenters.conf

 

[my-custom-segmenter]
MINOR = / : = @ . - $ # % \\ _ ~ %7E

 

 

Added those and bounced my test instance, but I still cannot search for 

index=myindex sourcetype=the_above_sourcetype LG

-- does not return results such as these, however *LG* as a term does return it.
456,DEF2,,456~LG~Login~v1.0.0,10.0.0.1,27bee310-a843-11ed-a629-db0c7ca6c807,,2023-02-10-11:59:44.666,300,1,"FAIL"

Labels (1)
Tags (1)
0 Karma

j_genac
Observer

Some of the relevant documentation and rationale for what I've tried:

https://docs.splunk.com/Documentation/Splunk/9.0.3/Data/Setthesegmentationforeventdata

 

Index-time segmentation
The SEGMENTATION attribute determines the segmentation type used at index time. Here's the syntax:

[<spec>]
SEGMENTATION = <seg_rule>
[<spec>] can be:

<sourcetype>: A source type in your event data.
host::<host>: A host value in your event data.
source::<source>: A source of your event data.
SEGMENTATION = <seg_rule>

This specifies the type of segmentation to use at index time for [<spec>] events.
<seg_rule>
A segmentation type, or "rule", defined in segmenters.conf
Common settings are inner, outer, none, and full, but the default file contains other predefined segmentation rules as well.
Create your own custom rule by editing $SPLUNK_HOME/etc/system/local/segmenters.conf, as described in "Configure segmentation types".

 

 


https://docs.splunk.com/Documentation/Splunk/9.0.3/Admin/Segmentersconf

[<SegmenterName>]

* Name your stanza.
* Follow this stanza name with any number of the following setting/value
  pairs.
* If you don't specify a setting/value pair, Splunk will use the default.

MAJOR = <space separated list of breaking characters>
* Set major breakers.
* Major breakers are words, phrases, or terms in your data that are surrounded
  by set breaking characters.
* By default, major breakers are set to most characters and blank spaces.
* Typically, major breakers are single characters.
* Note: \s represents a space; \n, a newline; \r, a carriage return; and
  \t, a tab.
* Default is [ ] < > ( ) { } | ! ; , ' " * \n \r \s \t & ? + %21 %26 %2526 %3B %7C %20 %2B %3D -- %2520 %5D %5B %3A %0A %2C %28 %29


MINOR = <space separated list of strings>
* Specifies minor breakers.
* In addition to the segments specified by the major breakers, for each minor
  breaker found, Splunk indexes the token from the last major breaker to the
  current minor breaker and from the last minor breaker to the current minor
  breaker.
* Default: / : = @ . - $ # % \\ _


I wrote the custom segmenters.conf stanza to inherit the default values of everything but attribute MINOR and simply appended ~ and the ascii code %7E for ~ at the end.

However, this did not segment my data properly at index time.

Tags (1)
0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...