Splunk Search

Lookup table greater than 2GB - possible solutions?

charltones
Explorer

I have a lookup file as CSV which contains > 27 million rows and is 2GB in size. When zipped it is 500MB.

I need to lookup search results to add fields from the lookup table. Splunk complains that the lookup file is too big (error in splunk logs).

What I'd like to know is what is the best option to work around this. Things I can think of:

1) Index the lookup data and do a join or subsearch
2) Put the lookup data in a database and query it using Splunk DB connect
3) Put the lookup data in a database and query it using REST or Python (perhaps using Redis to accelerate the DB queries)

Can you advise what the best route is?

0 Karma
1 Solution

jmallorquin
Builder

Hi,

Have you try convert the csv into kv store? If your version support kvstore

Hope i help you

View solution in original post

jmallorquin
Builder

Hi,

Have you try convert the csv into kv store? If your version support kvstore

Hope i help you

charltones
Explorer

Looks like kvstore is the thing to use! Many thanks I wasn't aware of it until now. I've tried setting up DB Connect anyway, but this looks like a simpler route. I will give it a go.

0 Karma

woodcock
Esteemed Legend

Now why didn't I think of that?!

0 Karma

woodcock
Esteemed Legend

You could also split the lookup into multiple lookup files. That is probably what I would do, if it is mostly upper-bounded at this point (will not grow very much).

0 Karma

charltones
Explorer

Yes this is the route I started to go down, but I will very likely need to search across multiple partitions of the data set. The most sensible partition is to split the data by country, but I will probably need to search across multiple countries.

0 Karma

woodcock
Esteemed Legend

I change my answer to "Use KV Store".

0 Karma
Career Survey
First 500 qualified respondents will receive a $20 gift card! Tell us about your professional Splunk journey.
Get Updates on the Splunk Community!

Thanks for the Memories! Splunk University, .conf25, and our Community

Thank you to everyone in the Splunk Community who joined us for .conf25, which kicked off with our iconic ...

Data Persistence in the OpenTelemetry Collector

This blog post is part of an ongoing series on OpenTelemetry. What happens if the OpenTelemetry collector ...

Introducing Splunk 10.0: Smarter, Faster, and More Powerful Than Ever

Now On Demand Whether you're managing complex deployments or looking to future-proof your data ...