Monitoring Splunk

Performance of using wildcard in query

imosquera
Explorer

I was wondering what the performance was of using a wildcard in a query. Specifically for the following:

source="/mnt/logs/*/debug.log"

OR a query containing a custom field:

uri_path="/v1/*"

Tags (1)
1 Solution

sideview
SplunkTrust
SplunkTrust

For the uri_path field, basically the server will have to get every event off disk so as to run the field extraction for uri_path and then check to see if it begins with "/v1/".

source on the other hand, along with sourcetype and host, behave a little differently. Since these are not just indexed fields but special fields backed by special metadata stores, the server will actually check the metadata for all the values of source and end up only requesting the source values that match the wildcard expression.

You can verify both of these with the following technique. In the following, "scanCount" is the number of events that Splunkd retrieved from disk, and "eventCount" is the number of events that were retrieved that ended up matching the search terms.

1) Find a search that just uses host or sourcetype values with no wildcards and tack | head 100000 on the end.

2( Click the little "i" icon to open the Job Inspector and scroll down until you see "scanCount" and "eventCount". Confirm that they are both 100,000 (although scanCount will usually be a couple thousand more. this is normal)

3) Now add your searchterm with the wildcard to the initial search clause, keeping the | head 100000 on the end. Now look into the Job Inspector. You'll see that for the wildcarded source terms, scanCount and eventCount are the same. For the wildcarded uri_path field, the scanCount will be significantly greater, because it had to keep getting all the possible matches off disk until it got to 100,000 rows that matched "/v1/"

View solution in original post

imosquera
Explorer

Thanks for your quick and thorough response!

0 Karma

sideview
SplunkTrust
SplunkTrust

For the uri_path field, basically the server will have to get every event off disk so as to run the field extraction for uri_path and then check to see if it begins with "/v1/".

source on the other hand, along with sourcetype and host, behave a little differently. Since these are not just indexed fields but special fields backed by special metadata stores, the server will actually check the metadata for all the values of source and end up only requesting the source values that match the wildcard expression.

You can verify both of these with the following technique. In the following, "scanCount" is the number of events that Splunkd retrieved from disk, and "eventCount" is the number of events that were retrieved that ended up matching the search terms.

1) Find a search that just uses host or sourcetype values with no wildcards and tack | head 100000 on the end.

2( Click the little "i" icon to open the Job Inspector and scroll down until you see "scanCount" and "eventCount". Confirm that they are both 100,000 (although scanCount will usually be a couple thousand more. this is normal)

3) Now add your searchterm with the wildcard to the initial search clause, keeping the | head 100000 on the end. Now look into the Job Inspector. You'll see that for the wildcarded source terms, scanCount and eventCount are the same. For the wildcarded uri_path field, the scanCount will be significantly greater, because it had to keep getting all the possible matches off disk until it got to 100,000 rows that matched "/v1/"

Get Updates on the Splunk Community!

Stay Connected: Your Guide to May Tech Talks, Office Hours, and Webinars!

Take a look below to explore our upcoming Community Office Hours, Tech Talks, and Webinars this month. This ...

They're back! Join the SplunkTrust and MVP at .conf24

With our highly anticipated annual conference, .conf, comes the fez-wearers you can trust! The SplunkTrust, as ...

Enterprise Security Content Update (ESCU) | New Releases

Last month, the Splunk Threat Research Team had two releases of new security content via the Enterprise ...