Does anyone have any ball park performance figures for what a kv store should perform like with 5mill+ entries for the specs below?
I have a kv store with around 5.5 million entries and it takes over 5 minutes to return from a simple lookup search.
ie. | localop |inputlookup session_kv | stats count
Runtime : 326 seconds.
This is running on a standalone hardware search head Splunk v6.4.1 Linux x64.
Basic System specs
100GB ram.
16 x Intel(R) Xeon(R) CPU X6550 @ 2.00GHz
DMC info
Collection size 7901MB
Acceleration size 577MB
Resident memory 8307MB
Virtual memory 21000MB
I've tried tweaking some of the available limits.conf settings, but lookup speed has not varied by much.
limits.conf
[default]
max_mem_usage_mb = 10000
[kvstore]
max_queries_per_batch = 20000
max_rows_per_query = 1000000
max_queries_per_batch = 20000
max_size_per_result_mb = 1000
max_accelerations_per_collection = 0
max_fields_per_acceleration = 0
max_threads_per_outputlookup = 0
I don't know of any other things I should do to try and improve the kv's performance.
Ideas?
In Summary on a hardware search head
100GB ram.
16 x Intel(R) Xeon(R) CPU X6550 @ 2.00GHz
Baseline query : between 500k-1million returned lines per min
Upsert (insert/update) query : between 1400-1800 records per min (combined across all concurrent same collection stores) using an accelerated key.
With this knowledge I can now try and develop a job schedule to see if I can get this kv store to function over the time frames I require by splitting the updates so that writes are spread across as much time as possible.
In Summary on a hardware search head
100GB ram.
16 x Intel(R) Xeon(R) CPU X6550 @ 2.00GHz
Baseline query : between 500k-1million returned lines per min
Upsert (insert/update) query : between 1400-1800 records per min (combined across all concurrent same collection stores) using an accelerated key.
With this knowledge I can now try and develop a job schedule to see if I can get this kv store to function over the time frames I require by splitting the updates so that writes are spread across as much time as possible.
Try adding the field you want to lookup to accelerated_fields in collections.conf. The key field will be accelerated by default but the field you want to lookup is not. http://docs.splunk.com/Documentation/Splunk/6.4.2/Admin/Collectionsconf#collections.conf.example
[mycollection]
field.foo = number
field.bar = string
accelerated_fields.myacceleration = {"foo": 1, "bar": -1}
I was able to load a 6M+ rows of csv data (Korea postal code) on my macbook pro machine and lookup returned almost right away when accelerated. It took more than few minutes without acceleration and I had to force stop.
Thanks. All the fields we are using are already accelerated.
I've tried turning on acceleration for a number of fields and for ~500k kv store rows it still takes 30+ seconds to return 😞
This search has completed and has returned 511,836 results by scanning 0 events in 38.708 seconds.
edit: ok i've just tried it locally on my pc. splunk 6.4.1. Core i7-4600U, 16GB ram. 512GB ssd.
Collection size 1801MB. Acceleration size 525MB.
This search has completed and has returned 1,431,835 results by scanning 0 events in 70.794 seconds.
It seems "very unlikely" to be able to recall 6M records in close to what could be called "instantly" using an out of the box install.
hmm so 1 minute for 500k lines is acceptable? (according to that post).
That would make my lookup twice as fast as theirs for the same number of objects.
Still looking for information on improving performance.
July 26th edit: so in comparison to that post I am running at double that expected speed already. So i'd guess this is the best I can hope for.