Monitoring Splunk

Performance of using wildcard in query

imosquera
Explorer

I was wondering what the performance was of using a wildcard in a query. Specifically for the following:

source="/mnt/logs/*/debug.log"

OR a query containing a custom field:

uri_path="/v1/*"

Tags (1)
1 Solution

sideview
SplunkTrust
SplunkTrust

For the uri_path field, basically the server will have to get every event off disk so as to run the field extraction for uri_path and then check to see if it begins with "/v1/".

source on the other hand, along with sourcetype and host, behave a little differently. Since these are not just indexed fields but special fields backed by special metadata stores, the server will actually check the metadata for all the values of source and end up only requesting the source values that match the wildcard expression.

You can verify both of these with the following technique. In the following, "scanCount" is the number of events that Splunkd retrieved from disk, and "eventCount" is the number of events that were retrieved that ended up matching the search terms.

1) Find a search that just uses host or sourcetype values with no wildcards and tack | head 100000 on the end.

2( Click the little "i" icon to open the Job Inspector and scroll down until you see "scanCount" and "eventCount". Confirm that they are both 100,000 (although scanCount will usually be a couple thousand more. this is normal)

3) Now add your searchterm with the wildcard to the initial search clause, keeping the | head 100000 on the end. Now look into the Job Inspector. You'll see that for the wildcarded source terms, scanCount and eventCount are the same. For the wildcarded uri_path field, the scanCount will be significantly greater, because it had to keep getting all the possible matches off disk until it got to 100,000 rows that matched "/v1/"

View solution in original post

imosquera
Explorer

Thanks for your quick and thorough response!

0 Karma

sideview
SplunkTrust
SplunkTrust

For the uri_path field, basically the server will have to get every event off disk so as to run the field extraction for uri_path and then check to see if it begins with "/v1/".

source on the other hand, along with sourcetype and host, behave a little differently. Since these are not just indexed fields but special fields backed by special metadata stores, the server will actually check the metadata for all the values of source and end up only requesting the source values that match the wildcard expression.

You can verify both of these with the following technique. In the following, "scanCount" is the number of events that Splunkd retrieved from disk, and "eventCount" is the number of events that were retrieved that ended up matching the search terms.

1) Find a search that just uses host or sourcetype values with no wildcards and tack | head 100000 on the end.

2( Click the little "i" icon to open the Job Inspector and scroll down until you see "scanCount" and "eventCount". Confirm that they are both 100,000 (although scanCount will usually be a couple thousand more. this is normal)

3) Now add your searchterm with the wildcard to the initial search clause, keeping the | head 100000 on the end. Now look into the Job Inspector. You'll see that for the wildcarded source terms, scanCount and eventCount are the same. For the wildcarded uri_path field, the scanCount will be significantly greater, because it had to keep getting all the possible matches off disk until it got to 100,000 rows that matched "/v1/"

Get Updates on the Splunk Community!

Extending Observability Content to Splunk Cloud

Watch Now!   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to leverage ...

More Control Over Your Monitoring Costs with Archived Metrics!

What if there was a way you could keep all the metrics data you need while saving on storage costs?This is now ...

New in Observability Cloud - Explicit Bucket Histograms

Splunk introduces native support for histograms as a metric data type within Observability Cloud with Explicit ...