I have 2 queries where each query retrieve the fields from different source using regex and combining it using append sand grouping the data using stats by common id and then evaluating the result, but what is happening is before it loads the data from query 2 it's evaluating and giving wrong result with large data set
Sample query looks like this
index=a component=serviceA "incoming data"
| eventstats values(name) as name ,values(age) as age by id1,id2
|append [search index=a component=serviceB "data from" | eventstats values(parentName) as parentName ,values(parentAge) as parentAge by id1,id2]
| stats values(name) as name ,values(age) as age, values(parentName) as parentName ,values(parentAge) as parentAge by id1,id2
| eval mismatch= case(isnull(name) AND isnull(age) ," data doesn't exist in serviceA",
isnull(parentName) AND isnull(parentAge) ," data doesn't exist in serviceB",
true, "No mismatch")
| table name,age,parentAge,parentName,mismatch,id1,id2
so in my case with large data before the dat get's loaded from query2 it's giving as data doesn't exist in serviceB, even though there is no mismatch.
Please suggest how we can tackle this situation, I tried using join , but it's same
You don't need a subsearch and using join/append are rarely necessary and should be avoided where possible. Subsearches have limitations.
You just need to search both datasets at the start with an (A) or (B) search then collect them together with stats. I am not sure why you are using eventstats - you don't need that and it will not perform well anyway.
Try this.
(index=a component=serviceA "incoming data")
OR (search index=a component=serviceB "data from")
| stats values(name) as name ,values(age) as age, values(parentName) as parentName ,values(parentAge) as parentAge by id1,id2
| eval mismatch=case(isnull(name) AND isnull(age) ," data doesn't exist in serviceA",
isnull(parentName) AND isnull(parentAge) ," data doesn't exist in serviceB",
true, "No mismatch")
| table name,age,parentAge,parentName,mismatch,id1,id2
Are you able to show that query? Using subsearches will not work for datasets where the subsearch has more than 50k results - not sure what your meaning of a 'large' dataset is.
If you can show an anonymised version of each of the data sets and show the relationships it will help us provide a solution.
@bowesmana , Query won't be able to share it, but I tried few different ways,
1)Created data model and tried combining using append and union as well, but it's not working when running for large data set which contains nearly 70k records for 15 mins time period, so when I run same query for that individual id it shows no mismatch but in large dataset the data won't be loaded from query 1
2) Created lookup files for each query , and in each file it has the data, but when it's combined using append or union the data is showing as data doesn't exist in query 1.
So suggest how we can proceed further.
Any subsearch will have a limit - there is a way to combine two datasets in lookups without using append, e.g.
| inputlookup file1.csv
| inputlookup append=t file2.csv
using append=t on the second inputlookup does NOT have a subsearch limitation.
Without knowing what you are doing in more detail, it's impossible to suggest a solution, however, even though you are using commands such as mvexpand, it is generally possible to do a single search (index=A OR index=B) method and then manipulate the result set to get what you want.
In my case from each query I'm retrieving few fields from that log using regex and makemv,mvexpand command, so not sure with that changes how I can do it