Splunk Search

CSV vs KV store lookup. How large is large?

Builder

Documentation comparing CSV and KV store notes that for large lookups, KV Store is preferred over CSV.
http://dev.splunk.com/view/SP-CAAAEY7#kvsvscsv

What is the definition of large? Is it measured in total bytes? Number of records? And in either case how much?
I also have read that up to a point, CSV would be preferred because is gets loaded in memory. What is that point?

0 Karma

Splunk Employee
Splunk Employee

Hi MonkeyK,

csv lookups are preferred for small tables that change infrequently. Most csv lookups contain no more than 100 rows of data.
KV Store is designed for large key-value data collections that frequently change, for example:

– Tracking workflow state changes (an incident-review system)
– Keeping a list of environment assets assigned to users and their metadata
– Controlling a job queue or application state as the user interacts with the app
KV store can:
– Enable per-record CRUD operations using the lookup commands and the REST API
– Access key-value data seamlessly across search head cluster
– Back up and restore KV Store data
Optionally, KV store can also:
- Allow data type enforcement on write operations
- Perform field accelerations and automatic lookups
- Work with distributed searches on the search peers (indexers)

Hope this helps. Thanks!
Hunter

0 Karma

Builder

Doesn't really help. I was looking for a more quantitative definition of large. But now you have added another undefined term "frequently". How do I evaluate "large" and "frequently"?

Most of what I am seeing are qualitative criteria that have little to do with anything that I know about.

I am interested in understanding the quantitative measures in general, but right now I am evaluating them against a specific use case of creating and maintaining Indicator of Compromise lists. These would be on the order of 100-1000 IP addresses or URLs (separate lists) that would be used once in a local search and then accumulated to a central list that would be used as part of a nightly search/alert. Each day, any indicators older than a set amount of time (say two weeks) would be removed.
I know that I can perform my use case using csv based lookups, but do not know how important it is/will be to consider KV stores.

0 Karma

Builder

as a follow-up, I have built out some csv based lookup tables with several thousand records. No problem in the searches.

0 Karma

Splunk Employee
Splunk Employee

Hi MonkeyK, if you have a lookup table file that is 100MB or larger, I'd consider that a large lookup table file. As far as the CSV vs KV store lookup question goes, KV Store collections live on the search head and are not passed down to indexers. CSV lookups are replicated to the indexers so if your lookup table changes frequently, this could lead to performance problems.