I have a lookup file as CSV which contains > 27 million rows and is 2GB in size. When zipped it is 500MB.
I need to lookup search results to add fields from the lookup table. Splunk complains that the lookup file is too big (error in splunk logs).
What I'd like to know is what is the best option to work around this. Things I can think of:
1) Index the lookup data and do a join or subsearch
2) Put the lookup data in a database and query it using Splunk DB connect
3) Put the lookup data in a database and query it using REST or Python (perhaps using Redis to accelerate the DB queries)
Can you advise what the best route is?
Looks like kvstore is the thing to use! Many thanks I wasn't aware of it until now. I've tried setting up DB Connect anyway, but this looks like a simpler route. I will give it a go.
You could also split the lookup into multiple lookup files. That is probably what I would do, if it is mostly upper-bounded at this point (will not grow very much).
Yes this is the route I started to go down, but I will very likely need to search across multiple partitions of the data set. The most sensible partition is to split the data by country, but I will probably need to search across multiple countries.