Solved: How do I improve a lookup-stats by output search?

landen99 · ‎10-13-2018

Let's say I have a search that immediately goes into a lookup with a filtered kvstore of 1 million events followed by a stats by a lookup output field:

index=my_index | lookup ioc_sha256 Indicator_Value AS sha256 OUTPUT Type Malware | stats first(_time) AS _time first(field1) AS field1 values(Malware) AS Malware etc.. by Type sha256

How can the performance of that search be improved? The goal is to show all match_field matches with other event level information from the event matches.

martin_mueller · ‎10-14-2018

There are 2.5 efficient ways to run large lookups:

Option 0.5, all-default-config CSV files. Your 1M rows will automatically get a TSIDX file built on disk the first time you use the CSV file after a change, this may take a minute or five. Then lookups will use the TSIDX, giving you about 500µs per event looked up. Works okay, but not really an option because any change will trigger a TSIDX rebuild... yet it's the only fast option you have without CLI access.
Option 1, bigger-limits CSV files. If you have the memory, increase this setting to accommodate your CSV file (default is 10MB):

[lookup]
max_memtable_bytes=10000000

That will make Splunk build the index structure in memory, giving you enormously fast speeds of about 8µs per event looked up. Comes with a penalty of about 4s index-building overhead per search though, so this is only best for looking up lots of events in one go. My 82MB lookup file produced about 600MB additional search process memory footprint.

Option 2, accelerated-field KV Store. Set this in collections.conf for your collection: accelerated_fields.ioc_accel = {"sha256": 1} or whatever your field name is. That'll tell mongoDB to do magic, giving me about 15µs per event looked up and no penalty for single-event searches or enormous memory footprint. For comparison, an unaccelerated KV Store lookup gave me about 6000µs per event looked up, a 40x speedup.

Performance numbers are based on my home Splunk, 7.1.2 running on Windows, and 1M randomly generated SHA256 values with just a count as lookup output fields.

I'd go with option 2, accelerated-field KV Store. You get most of the speedup for the least penalties.

View solution in original post

martin_mueller · ‎10-14-2018

There are 2.5 efficient ways to run large lookups:

Option 0.5, all-default-config CSV files. Your 1M rows will automatically get a TSIDX file built on disk the first time you use the CSV file after a change, this may take a minute or five. Then lookups will use the TSIDX, giving you about 500µs per event looked up. Works okay, but not really an option because any change will trigger a TSIDX rebuild... yet it's the only fast option you have without CLI access.
Option 1, bigger-limits CSV files. If you have the memory, increase this setting to accommodate your CSV file (default is 10MB):

[lookup]
max_memtable_bytes=10000000

That will make Splunk build the index structure in memory, giving you enormously fast speeds of about 8µs per event looked up. Comes with a penalty of about 4s index-building overhead per search though, so this is only best for looking up lots of events in one go. My 82MB lookup file produced about 600MB additional search process memory footprint.

Option 2, accelerated-field KV Store. Set this in collections.conf for your collection: accelerated_fields.ioc_accel = {"sha256": 1} or whatever your field name is. That'll tell mongoDB to do magic, giving me about 15µs per event looked up and no penalty for single-event searches or enormous memory footprint. For comparison, an unaccelerated KV Store lookup gave me about 6000µs per event looked up, a 40x speedup.

Performance numbers are based on my home Splunk, 7.1.2 running on Windows, and 1M randomly generated SHA256 values with just a count as lookup output fields.

I'd go with option 2, accelerated-field KV Store. You get most of the speedup for the least penalties.

How do I improve a lookup-stats by output search?

Observe and Secure All Apps with Splunk

Splunk Decoded: Business Transactions vs Business IQ

Fastest way to demo Observability

Are you a member of the Splunk Community?

How do I improve a lookup-stats by output search?

Observe and Secure All Apps with Splunk

Splunk Decoded: Business Transactions vs Business IQ

Fastest way to demo Observability