Hello, I'm Splunk Newbie.
This is a post that I found while looking for improvement of Splunk's search performance, but I'm asking you a question because it's a little confusing.
I referred to the two posts below.
https://idelta.co.uk/3-easy-ways-to-speed-up-your-splunk-searches-and-why-they-help/
Question 1)
- index=firewall_data 127.0.0.1
Or
- index=firewall_data "127.0.0.1"
If I search that, because of the internal segmentation process
127
127 1
127 0 1
Is it right to search by dividing it into three approach?
Because of this, If I use index=firewall_data TERM (127.1.1.24), is it correct that the breaker is not used and it shows better performance?
Question 2)
index=firewall_data "127.0.0.1" has more resources if the assumptions in question 1 are correct
The index= firewall_data TERM (127.1.1.24) should perform better, but when tested, it actually did the same.
It says that the data I searched for and the resource (time) are all the same, why?
Hi @munang,
Depending on your segmentation configuration, 127.0.0.1 will be indexed as:
0
1
127
127.0.0.1
or
0
1
127
127.0
127.0.0
127.0.0.1
You can verify this (relatively) easily with with an empty index and the walklex command. See https://docs.splunk.com/Documentation/Splunk/latest/SearchReference/Walklex.
With segmenters.conf INTERMEDIATE_MAJORS = false (the default):
| makeresults
| eval _raw="127.0.0.1"
| collect index=walklex_test
$ /opt/splunk/bin/splunk _internal call /data/indexes/walklex_test/roll-hot-buckets
| walklex type=term index=walklex_test
| table term
term
0
1
127
127.0.0.1
Using a source type and segmentation with INTERMEDIATE_MAJORS = true:
term
0
1
127
127.0
127.0.0
127.0.0.1
Both 127.0.0.1 and "127.0.0.1" will use the following base lispy at the indexing tier:
[ AND 0 1 127 ]
You can judge the efficiency of your search using the method you observed in your second question. In the search job inspector, you'll see: "This search has completed and has returned X results by scanning Y events in Z seconds." If X != Y, your search is scanning more events than needed, and introducing TERM() or otherwise modifying your search may improve efficiency.
TERM(127.0.0.1) will use the following base lispy at the indexing tier, also as you observed:
[ AND 127.0.0.1 ]
With INTERMEDIATE_MAJORS = false, TERM(127.0) will return no results.
With INTERMEDIATE_MAJORS = true, TERM(127.0) will return events with 127.0.0.1, 127.0.a$b. 127.0-foo, etc.
If searches with and without TERM() return X results by scanning Y events and X == Y, then the same number of events contain the segmented terms as contain the complete term. In this case, there is no direct efficiency to be gained by using TERM(). If your observations contradict this, i.e. you have events that match 127.1.1.24 but do not match TERM(127.1.1.24), then the answer may have something to do inconsistent time ranges across searches, misconfigured search peers, or misconfigured indexer clustering.
Hi @munang,
Depending on your segmentation configuration, 127.0.0.1 will be indexed as:
0
1
127
127.0.0.1
or
0
1
127
127.0
127.0.0
127.0.0.1
You can verify this (relatively) easily with with an empty index and the walklex command. See https://docs.splunk.com/Documentation/Splunk/latest/SearchReference/Walklex.
With segmenters.conf INTERMEDIATE_MAJORS = false (the default):
| makeresults
| eval _raw="127.0.0.1"
| collect index=walklex_test
$ /opt/splunk/bin/splunk _internal call /data/indexes/walklex_test/roll-hot-buckets
| walklex type=term index=walklex_test
| table term
term
0
1
127
127.0.0.1
Using a source type and segmentation with INTERMEDIATE_MAJORS = true:
term
0
1
127
127.0
127.0.0
127.0.0.1
Both 127.0.0.1 and "127.0.0.1" will use the following base lispy at the indexing tier:
[ AND 0 1 127 ]
You can judge the efficiency of your search using the method you observed in your second question. In the search job inspector, you'll see: "This search has completed and has returned X results by scanning Y events in Z seconds." If X != Y, your search is scanning more events than needed, and introducing TERM() or otherwise modifying your search may improve efficiency.
TERM(127.0.0.1) will use the following base lispy at the indexing tier, also as you observed:
[ AND 127.0.0.1 ]
With INTERMEDIATE_MAJORS = false, TERM(127.0) will return no results.
With INTERMEDIATE_MAJORS = true, TERM(127.0) will return events with 127.0.0.1, 127.0.a$b. 127.0-foo, etc.
If searches with and without TERM() return X results by scanning Y events and X == Y, then the same number of events contain the segmented terms as contain the complete term. In this case, there is no direct efficiency to be gained by using TERM(). If your observations contradict this, i.e. you have events that match 127.1.1.24 but do not match TERM(127.1.1.24), then the answer may have something to do inconsistent time ranges across searches, misconfigured search peers, or misconfigured indexer clustering.
thank you!
I completely understood the clear explanation.
Adding to that answer - your search term if you just search for "1.2.3.4" might not encompass a whole major-breaker-delimited search term but be somewhere in the middle of a "word" delimited by minor breakers - like "version.1.2.3.4". So Splunk searches for 1, 2, 3 and 4 separately and checks if the events matching all of those partial terms match the literal search term.
If you explicitly tell it to find TERM(1.2.3.4), it will find only those events for which the term 1.2.3.4.