Developing for Splunk Enterprise

Retrieving Large Data Sets From Lookup

milanparmar541
Explorer

Hi everyone,

I am having 8M+ data in kvstore lookup. I am reading data from the lookup to populate 12 dashboard panels using "|inputlookup <lookup_name>" command. Now I checked that the |inputlookup <lookup_name> command itself already taking around 1mins 30 seconds on 12core CPU configuration. So My concern is, Is there any other way to fetch all the lookup data faster? 

In my instance data is increasing day by day, so with growing data in lookup, |inputlookup <lookup_name> command will take more time to load the data. So let's say I am assuming after somedays, If I'll have 25million data in lookup so after sometime it may take 10-15 mins to return the result. So to maintain performance should I increase the core of my machine as well if I am having large data to retrieve? Or Is there any other way to retrieve the data faster apart from the kvstore lookup?

Labels (1)
0 Karma
1 Solution

richgalloway
SplunkTrust
SplunkTrust

If you just want to speed up display of results rather than speed up the search itself then you have several option.  I'd go with a scheduled search and have your dashboard use the last results of that search.

---
If this reply helps you, an upvote would be appreciated.

View solution in original post

richgalloway
SplunkTrust
SplunkTrust

Moving a lot of data takes time and moving more of it takes more time.  There's not much one can do about that.  If you're running Splunk 8.1 then consider switching your kvstore to the new WiredTiger storage engine, which is supposed to be faster.

---
If this reply helps you, an upvote would be appreciated.

milanparmar541
Explorer

Thanks for the suggestion @richgalloway  But I want to make sure that my application runs on the 8.0.x Splunk version as well. I've come across to know some of the approaches which may help me to run my |inputlookup <lookup_name> command in the background(through savedsearch) and I can just refer to the output of last run commands to display the lookup data on the dashboard. So when the user visits the dashboard, even the data in lookup is 15million still, it would be fairly fast. To make this happen there are again several approaches like accelerated savedsearch, savedsearch reference, data model, summary indexing, etc. Which would be the better approach if I want to run my |inputlookup <lookup_name> command in the background and once it finished and I get the stats out from the search to display on the dashboard. So I don't need to worry about how much time is taken by lookup command to execute to populate all the dashboards.

Thanks in Advance!!

0 Karma

richgalloway
SplunkTrust
SplunkTrust

If you just want to speed up display of results rather than speed up the search itself then you have several option.  I'd go with a scheduled search and have your dashboard use the last results of that search.

---
If this reply helps you, an upvote would be appreciated.

richgalloway
SplunkTrust
SplunkTrust

Can you use "| inputlookup <lookup_name> where <expr>" to reduce the amount of data loaded from the lookup?

---
If this reply helps you, an upvote would be appreciated.
0 Karma

milanparmar541
Explorer

Thanks for the response @richgalloway  I've already used the where expression wherever it's needed. but still, with the where expression, I am receiving a large amount of data(6M+) from the lookup with where condition. So again it will take a time to finish the "|inputlookup <lookup_name> where <exp>" query.  Is there any other way to retrieve all the lookup faster? I've checked that increasing CPU core improves the performance of data retrieval from the lookup. But it would be better if I can retrieve the data faster without increasing the CPU core.

0 Karma