Dashboards & Visualizations

How to optimize a search with huge amount of data?

nilaksh92
Path Finder

Hi Everyone

I have an index, under which records are coming at 30 seconds.

I have one lookup, which is having some fields.

Both Index and Lookup has one common field, on the basis of what I have get all matching rows.

After Getting data, I need to perform aggregations on day basis.

|inputlookup lookupname| join type=inner lookup_field[search index="abc" | rename index_field as "lookup_field"]

This is giving 500 records and not getting proper results.

index="abc" | join type=inner index_field[|inputlookup lookupname | rename "lookup_field" as "index_field"]

This is giving almost 2 lack records. But whenever I am using this is my dashboard, it is taking lot of time to display.

Which one is correct way of joining, if second one is corect, How to make it optimize?

Dashboard is getting refreshed at every 30 seconds interval.

Please guide on this.

Thanks
Nikks

Tags (1)
0 Karma

DalJeanis
SplunkTrust
SplunkTrust

Okay, here's a set of standard efficiency suggestions ...

1) ALWAYS get rid of all the rows you can at the very front. The subsearch here will tell splunk not to return any lines that are not in the lookup table. For more information, see the manual on the format command. The square braces in the subsearch cause the return values to be implicitly formatted as if piped to the format command.

index="abc" [|inputlookup lookupname | rename "lookup_field" as "index_field" | table "index_field"]

2) ALWAYS get rid of all the fields you can at the very front. The following, as the first line after the initial search, will tell splunk that it does not have to calculate any fields other than the ones that are listed.

| fields index_field myfield1 myfield2 

3) Where not subject to the above, PREFERABLY do any matching, lookup or joining at the latest point that you can... after aggregations if possible. That means that, instead of matching 1000 times for the same value, it aggregates those thousand rows once and then matches once. (If you are just using the lookup as a filter and not adding any data, then you do not need this step.)

| stats count as mycount sum(myfield1) as myfield1sum, max(myfield2) as myfield2max  by index_field 
| join index_field [|inputlookup lookupname | rename "lookup_field" as "index_field" | table index_field lookupvaluefield]
| table index_field mycount myfield1sum myfield2max lookupvaluefield

If you want any more specific advice, then you will need to post the rest of your query. (You didn't tell us WHAT you were aggregating, so we can't tell you how best to do that.)

DalJeanis
SplunkTrust
SplunkTrust

Come to think of it, there is a more important rule of efficiency, that saves 100% of the CPU cycles for certain operations. That rule is,

DON'T DO ANYTHING YOU DON'T ACTUALLY NEED TO DO.

You said you are aggregating on a "day" basis and refreshing every 30 seconds...

WHY?

Look at the role of whoever is looking at this dashboard, and yourself ask if that exact person will need to make a decision in the next 30 seconds based on what just happened. If not, then you are overengineering the dashboard. Back it off to every 5 minutes, and you have saved 90% of the CPU cycles.

Even if they do, consider having one panel with the full day's data, perhaps on a 10-minute refresh, and another panel showing ONLY the last fifteen minutes on 30-second refresh. Your dashboard will run better and be more useful, and you will reserve those CPU cycles for something that actually will benefit the business.

adonio
Ultra Champion

if i understand correctly and you need the results from the lookup based on the values under the filed XYZ from search, try and narrow down your search first: index=abc fields XYZ (or other filtering options such as sourcetye, host, etc) and then complete your search
hope it helps

0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...