Splunk Enterprise

Combining 2 queries using append and grouping the data using stats for large data set

meghashet21
Loves-to-Learn

I have 2 queries where each query retrieve the fields from different source using regex and combining it using append sand grouping the data using stats by common id and then evaluating the result, but what is happening is before it loads the data from query 2 it's evaluating and giving wrong result with large data set
Sample query looks like this

index=a component=serviceA "incoming data"
| eventstats values(name) as name ,values(age) as age by id1,id2
|append [search index=a component=serviceB "data from" | eventstats values(parentName) as parentName ,values(parentAge) as parentAge by id1,id2]
| stats values(name) as name ,values(age) as age, values(parentName) as parentName ,values(parentAge) as parentAge by id1,id2
| eval mismatch= case(isnull(name) AND isnull(age) ," data doesn't exist in serviceA",
isnull(parentName) AND isnull(parentAge) ," data doesn't exist in serviceB",
true, "No mismatch")
| table name,age,parentAge,parentName,mismatch,id1,id2


so in my case with large data before the dat get's loaded from query2 it's giving as data doesn't exist in serviceB, even though there is no mismatch.

Please suggest how we can tackle this situation, I tried using join , but it's same

Labels (1)
0 Karma

bowesmana
SplunkTrust
SplunkTrust

You don't need a subsearch and using join/append are rarely necessary and should be avoided where possible. Subsearches have limitations.

You just need to search both datasets at the start with an (A) or (B) search then collect them together with stats. I am not sure why you are using eventstats - you don't need that and it will not perform well anyway.

Try this.

(index=a component=serviceA "incoming data")
OR (search index=a component=serviceB "data from")
| stats values(name) as name ,values(age) as age, values(parentName) as parentName ,values(parentAge) as parentAge by id1,id2 
| eval mismatch=case(isnull(name) AND isnull(age) ," data doesn't exist in serviceA",
    isnull(parentName) AND isnull(parentAge) ," data doesn't exist in serviceB",
    true, "No mismatch") 
| table name,age,parentAge,parentName,mismatch,id1,id2

 

bowesmana
SplunkTrust
SplunkTrust

Are you able to show that query? Using subsearches will not work for datasets where the subsearch has more than 50k results - not sure what your meaning of a 'large' dataset is.

If you can show an anonymised version of each of the data sets and show the relationships it will help us provide a solution.

 

0 Karma

meghashet21
Loves-to-Learn

@bowesmana , Query won't be able to share it, but I tried few different ways,

1)Created data model and tried combining using append and union as well, but it's not working when running for large data set which contains nearly 70k records for 15 mins time period, so when I run same query for that individual id it shows no mismatch but in large dataset the data won't be loaded from query 1

2) Created lookup files for each query , and in each file it has the data, but when it's combined using append or union the data is showing as  data doesn't exist in query 1.

So suggest how we can proceed further.

0 Karma

bowesmana
SplunkTrust
SplunkTrust

Any subsearch will have a limit - there is a way to combine two datasets in lookups without using append, e.g.

| inputlookup file1.csv
| inputlookup append=t file2.csv

using append=t on the second inputlookup does NOT have a subsearch limitation.

Without knowing what you are doing in more detail, it's impossible to suggest a solution, however, even though you are using commands such as mvexpand, it is generally possible to do a single search (index=A OR index=B) method and then manipulate the result set to get what you want.

0 Karma

meghashet21
Loves-to-Learn

In my case from each query I'm retrieving few fields from that log using regex and makemv,mvexpand command, so not sure with that changes how I can do it 

0 Karma
Get Updates on the Splunk Community!

AppDynamics Summer Webinars

This summer, our mighty AppDynamics team is cooking up some delicious content on YouTube Live to satiate your ...

SOCin’ it to you at Splunk University

Splunk University is expanding its instructor-led learning portfolio with dedicated Security tracks at .conf25 ...

Credit Card Data Protection & PCI Compliance with Splunk Edge Processor

Organizations handling credit card transactions know that PCI DSS compliance is both critical and complex. The ...