Splunk Search

How to merge data for multiple Splunk queries and reduce the data retrieval time?

Taruchit
Contributor

Hi All,

I have 4 indexes: -

  • index1
  • index2
  • index3
  • index4

Each index has its own search criteria, there are some common field names and some distinct field names.

index1: - _time, field1, field2

index2: - _time, field3, field4

index3: - _time, field4, field2

index4: - field5, field2, field6, field7, field8, field9, field10

I tried to use multisearch command to merge the results for indexes.

|multisearch
[search index=index1 TERM(str1) TERM(str2) NOT TERM(str3) NOT TERM(str4)]
[search index=index2 sourcetype="xxx" fieldName=TERM(str5) TERM(str6)]
[search index=index3 sourcetype="yyy" TERM(fieldName1=str7)]
[search index=index4 TERM(fieldName2=123) TERM(fieldName3=str7) OR TERM(fieldName4=str7) TERM(fieldName=str5)]
|head 6000
|sort 0 -_time
|stats first(_time) AS _time, first(field1) AS field1, first(field2) AS field2, first(field3) AS field3, first(field4) AS field4, first(field5) AS field5, first(field6) AS field6, first(field7) AS field7, first(field8) AS field8, first(field9) AS field9, first(field10) AS field10 BY index

 

I do get results for all indexes, but following issues are faced: -

  1. The SPL runs too slowly to fetch the data.
  2. I do not get the values for all expected fields of the index. For example, in index4, I do not get data for field8 and field9. But, if I increase the number of events limit in head from 6000 to 10000, I get the data for those two fields as well. 

Thus, I need your help with following: -

  1. To understand and implement a better way of merging Splunk search results.
  2. To improve the performance of SPL while merging the results.

Thank you

Taruchit

Labels (5)
0 Karma

Taruchit
Contributor

I tried to solve the problem by using loadjob and base search approach in a dashboard.

<form>
<label>Dashboard</label>
<search id="index1_search">
<query>
index=index1 TERM(str1) TERM(str2) NOT TERM(str3) NOT TERM(str4) |head 1 |fields _time, field1, field2
</query>
<earliest>$time_Tok.earliest$</earliest>
<latest>$time_Tok.latest</latest>
<done>
<condition>
<set token="index1_events">$job.sid$</set>
</condition>
</done>
</search>
<search id="index2_search">
<query>
index=index2 sourcetype="xxx" fieldName=TERM(str5) TERM(str6) |head 1 |fields   _time, field3, field4
</query>
<earliest>$time_Tok.earliest$</earliest>
<latest>$time_Tok.latest</latest>
<done>
<condition>
<set token="index2_events">$job.sid$</set>
</condition>
</done>
</search>
<search id="index3_search">
<query>
index=index3 sourcetype="yyy" TERM(fieldName1=str7) |head 1 |fields _time, field4, field2
</query>
<earliest>$time_Tok.earliest$</earliest>
<latest>$time_Tok.latest</latest>
<done>
<condition>
<set token="index3_events">$job.sid$</set>
</condition>
</done>
</search>
<search id="index4_search">
<query>
index=index=index4 TERM(fieldName2=123) TERM(fieldName3=str7) OR TERM(fieldName4=str7) TERM(fieldName=str5) |head 1 |fields   field5, field2, field6, field7, field8, field9, field10
</query>
<earliest>$time_Tok.earliest$</earliest>
<latest>$time_Tok.latest</latest>
<done>
<condition>
<set token="index4_events">$job.sid$</set>
</condition>
</done>
</search>
<search id="combined_events">
<query>
|makeresults |append [|loadjob "$index1_events$"] |append [|loadjob "$index2_events$"] |append [|loadjob "$index3_events$"] |append [|loadjob "$index4_events$"]
</query>
</search>
<fieldset submitButton="true" autorun="false">
<input type="time" token="time_Tok" searchWhenChanged="false">
<label>Time Range</label>
<default>
<earliest>-7d@w0</earliest>
<latest>@w0</latest>
</default>
</input>
</fieldset>
<row>
<panel>
<table>
<title>Merged result</title>
<search base="combined_events">
<query>|table _time, field1, field2, field3, field4, field5, field6, field7, field8, field9, field10
</query>
</search>
<option name="drilldown">none</option>
<option name="refresh.display">progressbar</option>
</table>
</panel>
</row>
</form>
0 Karma

yuanliu
SplunkTrust
SplunkTrust

This is not to suggest a general approach to merging results.  Nor is this a "get fast quick" scheme.  The following offers a semantic reinterpretation of my reading of your intention as illustrated in your sample code: find the latest values of various fields by index.

| multisearch
    [| tstats latest(field1) as field1 latest(field2) as field2 max(_time) as _time where index=index1 TERM(str1) TERM(str2) NOT TERM(str3) NOT TERM(str4) by index]
    [| tstats max(_time) as _time latest(field3) as field3, latest(field4) as field4 where index=index2 sourcetype="xxx" fieldName=TERM(str5) TERM(str6) by index]
    [| tstats max(_time) as _time, latest(field4) as field4, latest(field2) as field2 where index=index3 sourcetype="yyy" TERM(fieldName1=str7) by index]
    [| tstats max(_time) as _time, latest(field5) as field5, latest(field2)  as field2, latest(field6) as field6, latest(field7) as field7, latest(field8) as field8, latest(field9) as field9, latest(field10) as field10 where index=index4 TERM(fieldName2=123) TERM(fieldName3=str7) OR TERM(fieldName4=str7) TERM(fieldName=str5) by index]

In each of the subsearches, I preserved the _time value that your original code does not output.  This can make manual verification easier.

Tags (1)

Taruchit
Contributor

Hi @yuanliu,

Thank you for sharing your inputs.

Somehow I am unable to get data for fields other than index when I tried with the approach shared by you. It may be due to some issues my end, I will try to check again. 

Taruchit

0 Karma

yuanliu
SplunkTrust
SplunkTrust

One possibility that tstats cannot recover fields available in normal index search could be that those fields come from an automatic transform, automatic lookup, calculated fields, etc.  In such cases, you will need to use some tricks to "extract" them in the tstats command itself. (Another possibility is if your search period falls in the lag before summary index is produced.)  Depending on the actual format of raw events, this can become tricky because there are limited tools you can use in tstats. (PREF is perhaps the main one, which can work if all those fields appear as key-value pairs with easily identifiable separators such as = and :.  Even when this is applicable, you will still need to clean up output in order to apply subsequent logic.)

For the sole purpose of speeding up search (provided that summary index is produced in time for your search), you can do something like

 

| multisearch
    [| tstats latest(field1) as field1 latest(field2) as field2 max(_time) as _time where index=index1 TERM(str1) TERM(str2) NOT TERM(str3) NOT TERM(str4) by index]
    [| tstats max(_time) as _time latest(_raw) as _raw where index=index2 sourcetype="xxx" fieldName=TERM(str5) TERM(str6) by index]
    [| tstats max(_time) as _time, latest(_raw) as _raw where index=index3 sourcetype="yyy" TERM(fieldName1=str7) by index]
    [| tstats max(_time) as _time, latest(_raw) as _raw where index=index4 TERM(fieldName2=123) TERM(fieldName3=str7) OR TERM(fieldName4=str7) TERM(fieldName=str5) by index]

 

After getting the newest _raw from each index with respective filters, you can extract/calculate necessary fields from three _raw fields (as opposed to asking Splunk to do extraction on many events).

I forgot to comment on your observation that | head 6000 sometimes leaves out some fields in your original strategy.  I believe that is because those fields are not present in all events; in other words, those expected fields are rare.  If that is the case, you can add, for example, field8=* field9=* to the search in index4.

0 Karma
Get Updates on the Splunk Community!

New Case Study Shows the Value of Partnering with Splunk Academic Alliance

The University of Nevada, Las Vegas (UNLV) is another premier research institution helping to shape the next ...

How to Monitor Google Kubernetes Engine (GKE)

We’ve looked at how to integrate Kubernetes environments with Splunk Observability Cloud, but what about ...

Index This | How can you make 45 using only 4?

October 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with this ...