Hi @PavelP, This isn't an issue with TERM or PREFIX but with how Splunk indexes abc--xyz. We can use walklex to list terms in our index: | walklex index=main type=term
| table term We'll find the...
See more...
Hi @PavelP, This isn't an issue with TERM or PREFIX but with how Splunk indexes abc--xyz. We can use walklex to list terms in our index: | walklex index=main type=term
| table term We'll find the following: abc abc##xyz abc$$xyz abc%%xyz abc..xyz abc//xyz abc==xyz abc@@xyz abc\\xyz abc__xyz xyz Note that abc--xyz is missing. Let's look at segmenters.conf. The default segmenter stanza is [indexing]: [indexing]
INTERMEDIATE_MAJORS = false
MAJOR = [ ] < > ( ) { } | ! ; , ' " * \n \r \s \t & ? + %21 %26 %2526 %3B %7C %20 %2B %3D -- %2520 %5D %5B %3A %0A %2C %28 %29
MINOR = / : = @ . - $ # % \\ _ Note that -- is a major breaker. If we index abc-xyz with a single hyphen, we should find abc-xyz in the list of terms: abc abc##xyz abc$$xyz abc%%xyz abc-xyz abc..xyz abc//xyz abc==xyz abc@@xyz abc\\xyz abc__xyz xyz If walklex returns a missing merged_lexicon.lex message, we can force optimization of the bucket(s) to generate the data, e.g.: $SPLUNK_HOME/bin/splunk-optimize-lex -d $SPLUNK_HOME/var/lib/splunk/main/db/hot_v1_0 We can override major breakers in a custom segmenters.conf stanza and reference the stanza in props.conf. Ensure the segmenter name is unique and remove -- from the MAJOR setting: # segmenters.conf
[tmp_test_txt]
INTERMEDIATE_MAJORS = false
MAJOR = [ ] < > ( ) { } | ! ; , ' " * \n \r \s \t & ? + %21 %26 %2526 %3B %7C %20 %2B %3D %2520 %5D %5B %3A %0A %2C %28 %29
MINOR = / : = @ . - $ # % \\ _
# props.conf
[source::///tmp/test.txt]
SEGMENTATION = tmp_test_txt Deploy props.conf and segmenters.conf to both search heads and search peers (indexers). With the new configuration in place, walklex should return abc--xyz in the list of terms: abc abc##xyz abc$$xyz abc%%xyz abc--xyz abc..xyz abc//xyz abc==xyz abc@@xyz abc\\xyz abc__xyz xyz We can now use TERM and PREFIX as expected: | tstats values(PREFIX(abc--)) as vals where index=main TERM(abc--*) by PREFIX(abc--) abc-- vals xyz xyz As always, we should ask ourselves if changing the default behavior is both required and desired. Isolating the segmentation settings by source or sourcetype will help mitigate risk.