Splunk Search

CSV vs KV store lookup. How large is large?

MonkeyK
Builder

Documentation comparing CSV and KV store notes that for large lookups, KV Store is preferred over CSV.
http://dev.splunk.com/view/SP-CAAAEY7#kvsvscsv

What is the definition of large? Is it measured in total bytes? Number of records? And in either case how much?
I also have read that up to a point, CSV would be preferred because is gets loaded in memory. What is that point?

0 Karma

hunters_splunk
Splunk Employee
Splunk Employee

Hi MonkeyK,

csv lookups are preferred for small tables that change infrequently. Most csv lookups contain no more than 100 rows of data.
KV Store is designed for large key-value data collections that frequently change, for example:

– Tracking workflow state changes (an incident-review system)
– Keeping a list of environment assets assigned to users and their metadata
– Controlling a job queue or application state as the user interacts with the app
KV store can:
– Enable per-record CRUD operations using the lookup commands and the REST API
– Access key-value data seamlessly across search head cluster
– Back up and restore KV Store data
Optionally, KV store can also:
- Allow data type enforcement on write operations
- Perform field accelerations and automatic lookups
- Work with distributed searches on the search peers (indexers)

Hope this helps. Thanks!
Hunter

0 Karma

MonkeyK
Builder

Doesn't really help. I was looking for a more quantitative definition of large. But now you have added another undefined term "frequently". How do I evaluate "large" and "frequently"?

Most of what I am seeing are qualitative criteria that have little to do with anything that I know about.

I am interested in understanding the quantitative measures in general, but right now I am evaluating them against a specific use case of creating and maintaining Indicator of Compromise lists. These would be on the order of 100-1000 IP addresses or URLs (separate lists) that would be used once in a local search and then accumulated to a central list that would be used as part of a nightly search/alert. Each day, any indicators older than a set amount of time (say two weeks) would be removed.
I know that I can perform my use case using csv based lookups, but do not know how important it is/will be to consider KV stores.

0 Karma

MonkeyK
Builder

as a follow-up, I have built out some csv based lookup tables with several thousand records. No problem in the searches.

0 Karma

myu_splunk
Splunk Employee
Splunk Employee

Hi MonkeyK, if you have a lookup table file that is 100MB or larger, I'd consider that a large lookup table file. As far as the CSV vs KV store lookup question goes, KV Store collections live on the search head and are not passed down to indexers. CSV lookups are replicated to the indexers so if your lookup table changes frequently, this could lead to performance problems.

Career Survey
First 500 qualified respondents will receive a $20 gift card! Tell us about your professional Splunk journey.
Get Updates on the Splunk Community!

Calling All Security Pros: Ready to Race Through Boston?

Hey Splunkers, .conf25 is heading to Boston and we’re kicking things off with something bold, competitive, and ...

Beyond Detection: How Splunk and Cisco Integrated Security Platforms Transform ...

Financial services organizations face an impossible equation: maintain 99.9% uptime for mission-critical ...

Customer success is front and center at .conf25

Hi Splunkers, If you are not able to be at .conf25 in person, you can still learn about all the latest news ...