I'm running a search against about 1.2 million log records. Each record contains some geo tags and numeric values representing performance metrics. There are a total of about 45 key/values per record including the following:
id: the service id
type: the service type
testId: the type of test (e.g. latency, throughput)
region: the user's geographical region
median: the median performance metric value
ip: the user's IP address
The search query I'm running calculates a 90th percentile median performance value grouped by service id within a specific geographical region, service type and test ID. Here is an example query:
type="CDN" (testId="tl" OR testId="l") region="us" | eventstats perc90(median) as median90 | where median <= median90 | stats mean(median) as mean median(median) as median stdev(median) as stdev avg(stdDev) as avg_stdev count(median) as num_tests dc(ip) as num_ips by id | eval rel_stdev=100*(stdev/median) | table id, mean, median, avg_stdev, stdev, rel_stdev, num_tests, num_ips | sort median
To my disappointment, this query is taking about 5 minutes to run completely on a fairly high end dedicated server (quad core X5570 2.93 GHz, 128GB memory, Raid 0 15K SAS + SSD cache) and much longer on the new hosted splunkstorm service. My question is if this level of performance should be expected for this amount of data and this type of search query. Are there any optimizations that could be made at index or search time in order to improve performance? Is there a significant hit on performance when applying | stats or | eventstats to a search? I've been using splunk for 5 days now... any help would be greatly appreciated.
... View more