Splunk Search

Splunk search performance using TERM() function

munang
Path Finder

Hello, I'm Splunk Newbie.

This is a post that I found while looking for improvement of Splunk's search performance, but I'm asking you a question because it's a little confusing.

 

I referred to the two posts below.

https://splunk.illinois.edu/splunk-at-illinois/using-splunk/searching-splunk/how-to-optimize-your-se...

https://idelta.co.uk/3-easy-ways-to-speed-up-your-splunk-searches-and-why-they-help/

munang_0-1716731887849.png

 

 

munang_1-1716731907890.png

munang_2-1716731929995.png

Question 1)
- index=firewall_data 127.0.0.1
Or
- index=firewall_data "127.0.0.1"
If I search that, because of the internal segmentation process

127
127 1
127 0 1

Is it right to search by dividing it into three approach?

Because of this, If I use index=firewall_data TERM (127.1.1.24), is it correct that the breaker is not used and it shows better performance?

Question 2)

index=firewall_data "127.0.0.1" has more resources if the assumptions in question 1 are correct

The index= firewall_data TERM (127.1.1.24) should perform better, but when tested, it actually did the same.

It says that the data I searched for and the resource (time) are all the same, why?

 

Labels (1)
0 Karma
1 Solution

tscroggins
Influencer

Hi @munang,

Depending on your segmentation configuration, 127.0.0.1 will be indexed as:

0
1
127
127.0.0.1

or

0
1
127
127.0
127.0.0
127.0.0.1

You can verify this (relatively) easily with with an empty index and the walklex command. See https://docs.splunk.com/Documentation/Splunk/latest/SearchReference/Walklex.

With segmenters.conf INTERMEDIATE_MAJORS = false (the default):

| makeresults
| eval _raw="127.0.0.1"
| collect index=walklex_test

$ /opt/splunk/bin/splunk _internal call /data/indexes/walklex_test/roll-hot-buckets

| walklex type=term index=walklex_test
| table term

term
0
1
127
127.0.0.1

Using a source type and segmentation with INTERMEDIATE_MAJORS = true:

term
0
1
127
127.0
127.0.0
127.0.0.1

Both 127.0.0.1 and "127.0.0.1" will use the following base lispy at the indexing tier:

[ AND 0 1 127 ]

You can judge the efficiency of your search using the method you observed in your second question. In the search job inspector, you'll see: "This search has completed and has returned X results by scanning Y events in Z seconds." If X != Y, your search is scanning more events than needed, and introducing TERM() or otherwise modifying your search may improve efficiency.

TERM(127.0.0.1)  will use the following base lispy at the indexing tier, also as you observed:

[ AND 127.0.0.1 ]

With INTERMEDIATE_MAJORS = false, TERM(127.0) will return no results.

With INTERMEDIATE_MAJORS = true, TERM(127.0) will return events with 127.0.0.1, 127.0.a$b. 127.0-foo, etc.

If searches with and without TERM() return X results by scanning Y events and X == Y, then the same number of events contain the segmented terms as contain the complete term. In this case, there is no direct efficiency to be gained by using TERM(). If your observations contradict this, i.e. you have events that match 127.1.1.24 but do not match TERM(127.1.1.24), then the answer may have something to do inconsistent time ranges across searches, misconfigured search peers, or misconfigured indexer clustering.

View solution in original post

tscroggins
Influencer

Hi @munang,

Depending on your segmentation configuration, 127.0.0.1 will be indexed as:

0
1
127
127.0.0.1

or

0
1
127
127.0
127.0.0
127.0.0.1

You can verify this (relatively) easily with with an empty index and the walklex command. See https://docs.splunk.com/Documentation/Splunk/latest/SearchReference/Walklex.

With segmenters.conf INTERMEDIATE_MAJORS = false (the default):

| makeresults
| eval _raw="127.0.0.1"
| collect index=walklex_test

$ /opt/splunk/bin/splunk _internal call /data/indexes/walklex_test/roll-hot-buckets

| walklex type=term index=walklex_test
| table term

term
0
1
127
127.0.0.1

Using a source type and segmentation with INTERMEDIATE_MAJORS = true:

term
0
1
127
127.0
127.0.0
127.0.0.1

Both 127.0.0.1 and "127.0.0.1" will use the following base lispy at the indexing tier:

[ AND 0 1 127 ]

You can judge the efficiency of your search using the method you observed in your second question. In the search job inspector, you'll see: "This search has completed and has returned X results by scanning Y events in Z seconds." If X != Y, your search is scanning more events than needed, and introducing TERM() or otherwise modifying your search may improve efficiency.

TERM(127.0.0.1)  will use the following base lispy at the indexing tier, also as you observed:

[ AND 127.0.0.1 ]

With INTERMEDIATE_MAJORS = false, TERM(127.0) will return no results.

With INTERMEDIATE_MAJORS = true, TERM(127.0) will return events with 127.0.0.1, 127.0.a$b. 127.0-foo, etc.

If searches with and without TERM() return X results by scanning Y events and X == Y, then the same number of events contain the segmented terms as contain the complete term. In this case, there is no direct efficiency to be gained by using TERM(). If your observations contradict this, i.e. you have events that match 127.1.1.24 but do not match TERM(127.1.1.24), then the answer may have something to do inconsistent time ranges across searches, misconfigured search peers, or misconfigured indexer clustering.

munang
Path Finder

@tscroggins 

thank you!
I completely understood the clear explanation.

PickleRick
SplunkTrust
SplunkTrust

Adding to that answer - your search term if you just search for "1.2.3.4" might not encompass a whole major-breaker-delimited search term but be somewhere in the middle  of a "word" delimited by minor breakers - like "version.1.2.3.4". So Splunk searches for 1, 2, 3 and 4 separately and checks if the events matching all of those partial terms match the literal search term.

If you explicitly tell it to find TERM(1.2.3.4), it will find only those events for which the term 1.2.3.4.

0 Karma
Get Updates on the Splunk Community!

Get More Out of Your Security Practice With a SIEM

Get More Out of Your Security Practice With a SIEMWednesday, July 31, 2024  |  11AM PT / 2PM ETREGISTER ...

New This Month - SLO Capabilities, APM Advanced Filtering & Usage Analytics Plus ...

More for SLO Management We’re continuing to expand the built-in SLO management experience in Splunk ...

Enterprise Security Content Update (ESCU) | New Releases

In June, the Splunk Threat Research Team had 2 releases of new security content via the Enterprise Security ...