I was wondering what the performance was of using a wildcard in a query. Specifically for the following:
source="/mnt/logs/*/debug.log"
OR a query containing a custom field:
uri_path="/v1/*"
For the uri_path field, basically the server will have to get every event off disk so as to run the field extraction for uri_path and then check to see if it begins with "/v1/".
source on the other hand, along with sourcetype and host, behave a little differently. Since these are not just indexed fields but special fields backed by special metadata stores, the server will actually check the metadata for all the values of source and end up only requesting the source values that match the wildcard expression.
You can verify both of these with the following technique. In the following, "scanCount" is the number of events that Splunkd retrieved from disk, and "eventCount" is the number of events that were retrieved that ended up matching the search terms.
1) Find a search that just uses host or sourcetype values with no wildcards and tack | head 100000
on the end.
2( Click the little "i" icon to open the Job Inspector and scroll down until you see "scanCount" and "eventCount". Confirm that they are both 100,000 (although scanCount will usually be a couple thousand more. this is normal)
3) Now add your searchterm with the wildcard to the initial search clause, keeping the | head 100000
on the end. Now look into the Job Inspector. You'll see that for the wildcarded source terms, scanCount and eventCount are the same. For the wildcarded uri_path field, the scanCount will be significantly greater, because it had to keep getting all the possible matches off disk until it got to 100,000 rows that matched "/v1/"
Thanks for your quick and thorough response!
For the uri_path field, basically the server will have to get every event off disk so as to run the field extraction for uri_path and then check to see if it begins with "/v1/".
source on the other hand, along with sourcetype and host, behave a little differently. Since these are not just indexed fields but special fields backed by special metadata stores, the server will actually check the metadata for all the values of source and end up only requesting the source values that match the wildcard expression.
You can verify both of these with the following technique. In the following, "scanCount" is the number of events that Splunkd retrieved from disk, and "eventCount" is the number of events that were retrieved that ended up matching the search terms.
1) Find a search that just uses host or sourcetype values with no wildcards and tack | head 100000
on the end.
2( Click the little "i" icon to open the Job Inspector and scroll down until you see "scanCount" and "eventCount". Confirm that they are both 100,000 (although scanCount will usually be a couple thousand more. this is normal)
3) Now add your searchterm with the wildcard to the initial search clause, keeping the | head 100000
on the end. Now look into the Job Inspector. You'll see that for the wildcarded source terms, scanCount and eventCount are the same. For the wildcarded uri_path field, the scanCount will be significantly greater, because it had to keep getting all the possible matches off disk until it got to 100,000 rows that matched "/v1/"