- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I'm running a search against about 1.2 million log records. Each record contains some geo tags and numeric values representing performance metrics. There are a total of about 45 key/values per record including the following:
- id: the service id
- type: the service type
- testId: the type of test (e.g. latency, throughput)
- region: the user's geographical region
- median: the median performance metric value
- ip: the user's IP address
The search query I'm running calculates a 90th percentile median performance value grouped by service id within a specific geographical region, service type and test ID. Here is an example query:
type="CDN" (testId="tl" OR testId="l") region="us" | eventstats perc90(median) as median90 | where median <= median90 | stats mean(median) as mean median(median) as median stdev(median) as stdev avg(stdDev) as avg_stdev count(median) as num_tests dc(ip) as num_ips by id | eval rel_stdev=100*(stdev/median) | table id, mean, median, avg_stdev, stdev, rel_stdev, num_tests, num_ips | sort median
To my disappointment, this query is taking about 5 minutes to run completely on a fairly high end dedicated server (quad core X5570 2.93 GHz, 128GB memory, Raid 0 15K SAS + SSD cache) and much longer on the new hosted splunkstorm service. My question is if this level of performance should be expected for this amount of data and this type of search query. Are there any optimizations that could be made at index or search time in order to improve performance? Is there a significant hit on performance when applying | stats
or | eventstats
to a search? I've been using splunk for 5 days now... any help would be greatly appreciated.
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

A search like that across that amount of data on that hardware should take something closer to 30 seconds on a single-instance Splunk system, even assuming your testId
and region
accounts for basically all 1.2 million records.
If you're way off from that, I would try a couple of things first:
- If you're running in the web UI/timeline, turn off "Field discovery"
- Better yet, try running in the "Advanced Charting" view (under the "Views" menu)
- Run in the "Advanced Charting" view, hiding the chart and turning off the "Preview" checkbox.
- Even better try running on the command line, and try adding the "-preview false" option.
Just for reference, when I do a slightly smaller (just over 1,000,000 events) and somewhat simpler search than yours on a three-plus year old laptop, I go from taking about 130 seconds to about 70 (doubling the speed) when I turn off field discovery, and then down to 30 seconds (another doubling) in the "Advanced Charting" view (with preview still on), and down to 25 seconds on the command line without preview.
I don't actually see any obvious improvements that can be made to your query while keeping the same results. However, I would be curious as to how it runs if you try each of the following:
- Omit the
eventstats
andwhere
clauses near the beginning - Omit the
sort
at the end - Omit the
eventstats
,where
, andsort
clauses - And just for kicks, run it with only the base search plus the
eventstats
andwhere
clauses.
It would also be helpful to know the final number of results returned as well as the scan count. The information in the "Inspect Search Job" page (under the "Actions" menu on the timeline search view) would be useful too, though maybe a bit obscure.
One thing to note is that a single-instance Splunk is not able to take advantage of your hardware when running a single search. I would say that if you're trying to get this to run faster, you could probably run three, four, or even more Splunk instances in a distributed config on that same machine to better utilize it, but that setup takes a bit of work and knowledge to get right.
Finally, seeing a few lines of your data might indicate something, though if it's CDN access logs, that's unlikely.
UPDATE:
Try this and see how it compares. Be sure to turn preview off:
type="CDN" (testId="tl" OR testId="l") region="us" | eval median = if( median <= [ search type="CDN" (testId="tl" OR testId="l") region="us" | stats perc90(median) as search ], median,null()) | stats mean(median) as mean median(median) as median stdev(median) as stdev avg(stdDev) as avg_stdev count(median) as num_tests dc(ip) as num_ips by id | eval rel_stdev=100*(stdev/median) | table id, mean, median, avg_stdev, stdev, rel_stdev, num_tests, num_ips | sort median
Also, what I suspect is happening is that the eventstats
is taking a long time to finalize, i.e., the actual computation is getting done pretty quick, but marking up the set of intermediate results is taking a long time. If you are okay with having only an approximate 90th percentile, rather than exact, try:
type="CDN" (testId="tl" OR testId="l") region="us" | eval median = if( median <= [ search type="CDN" (testId="tl" OR testId="l") region="us" | head 9999 | stats perc90(median) as search ], median,null()) | stats mean(median) as mean median(median) as median stdev(median) as stdev avg(stdDev) as avg_stdev count(median) as num_tests dc(ip) as num_ips by id | eval rel_stdev=100*(stdev/median) | table id, mean, median, avg_stdev, stdev, rel_stdev, num_tests, num_ips | sort median
which should only look at the most recent 9999 events to compute the 90th percentile, rather than scanning all 1.2 million events. This should be a lot faster than the previous, though different.
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

A search like that across that amount of data on that hardware should take something closer to 30 seconds on a single-instance Splunk system, even assuming your testId
and region
accounts for basically all 1.2 million records.
If you're way off from that, I would try a couple of things first:
- If you're running in the web UI/timeline, turn off "Field discovery"
- Better yet, try running in the "Advanced Charting" view (under the "Views" menu)
- Run in the "Advanced Charting" view, hiding the chart and turning off the "Preview" checkbox.
- Even better try running on the command line, and try adding the "-preview false" option.
Just for reference, when I do a slightly smaller (just over 1,000,000 events) and somewhat simpler search than yours on a three-plus year old laptop, I go from taking about 130 seconds to about 70 (doubling the speed) when I turn off field discovery, and then down to 30 seconds (another doubling) in the "Advanced Charting" view (with preview still on), and down to 25 seconds on the command line without preview.
I don't actually see any obvious improvements that can be made to your query while keeping the same results. However, I would be curious as to how it runs if you try each of the following:
- Omit the
eventstats
andwhere
clauses near the beginning - Omit the
sort
at the end - Omit the
eventstats
,where
, andsort
clauses - And just for kicks, run it with only the base search plus the
eventstats
andwhere
clauses.
It would also be helpful to know the final number of results returned as well as the scan count. The information in the "Inspect Search Job" page (under the "Actions" menu on the timeline search view) would be useful too, though maybe a bit obscure.
One thing to note is that a single-instance Splunk is not able to take advantage of your hardware when running a single search. I would say that if you're trying to get this to run faster, you could probably run three, four, or even more Splunk instances in a distributed config on that same machine to better utilize it, but that setup takes a bit of work and knowledge to get right.
Finally, seeing a few lines of your data might indicate something, though if it's CDN access logs, that's unlikely.
UPDATE:
Try this and see how it compares. Be sure to turn preview off:
type="CDN" (testId="tl" OR testId="l") region="us" | eval median = if( median <= [ search type="CDN" (testId="tl" OR testId="l") region="us" | stats perc90(median) as search ], median,null()) | stats mean(median) as mean median(median) as median stdev(median) as stdev avg(stdDev) as avg_stdev count(median) as num_tests dc(ip) as num_ips by id | eval rel_stdev=100*(stdev/median) | table id, mean, median, avg_stdev, stdev, rel_stdev, num_tests, num_ips | sort median
Also, what I suspect is happening is that the eventstats
is taking a long time to finalize, i.e., the actual computation is getting done pretty quick, but marking up the set of intermediate results is taking a long time. If you are okay with having only an approximate 90th percentile, rather than exact, try:
type="CDN" (testId="tl" OR testId="l") region="us" | eval median = if( median <= [ search type="CDN" (testId="tl" OR testId="l") region="us" | head 9999 | stats perc90(median) as search ], median,null()) | stats mean(median) as mean median(median) as median stdev(median) as stdev avg(stdDev) as avg_stdev count(median) as num_tests dc(ip) as num_ips by id | eval rel_stdev=100*(stdev/median) | table id, mean, median, avg_stdev, stdev, rel_stdev, num_tests, num_ips | sort median
which should only look at the most recent 9999 events to compute the 90th percentile, rather than scanning all 1.2 million events. This should be a lot faster than the previous, though different.
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Just updated again. Made a mistake. Basically, I forgot to remove eventstats
from the subsearch and replace it with stats
. I believe the suggested changes should run that query in about 2 minutes in the GUI, and about 40 seconds on CLI. (Basically, double the time of the version using stats
without eventstats
.)
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

I will update my answer with a suggestion on something to try to improve performance, but I do not know if it will help. (I believe it will help if you have a distributed/multi-indexer Splunk systems, but I don't know about a single-node.) As for pre-indexing specific fields, retrieval is not really the problem here, and there isn't something currently that will help. If you need to do this over time, using new and more data sets however, you can and should use summary indexing to pre-compute results over subsets of the data, so that you can get the full results faster.
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks. Here are the stats for the searches you recommended:
GUI with preview/field discovery on: ~5 minutes
GUI with preview/field discover off without eventstats
and where
clause: 54 seconds
Without eventstats
, where
or sort
clause: 47 seconds
CLI as-is: 1:54
CLI without eventstats
or where
: 19 seconds
CLI without eventstats
, where
or sort
: 19 seconds
CLI with only eventstats
and where
: 1:42
So, the big hit is for eventstats
. Is there a better way to do a 90th percentile filter? Is there someway to pre-index some of these fields we will commonly search on?
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Oh, i forgot something important. The GUI and various settings in it can make a huge difference. Editing this above.
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Cloudharmony,
There are a few things to consider that might perk up your result times:
- Start your search using indexed fields (e.g. sourcetype, source, host, and/or index) to prevent Splunk from having to waste time looking at irrelevant data
- If this is a query you will perform often, create a summary search to run at some set interval (e.g. 10 min) then report across the summary data.
Beyond the above, yes you do take performance hits with various splunk analytic commands and there is some guidance to help improve this in the Splunk docs.
Sean
