Re: Fields match between multiple sources and loop...

M28 · ‎11-30-2022

Hi All,

Below is the sample data looks like.

sourcetype_1

s1_field1: 123

s1_field2: {

{

ID: 2

Name: ABC

}, {

ID: 1

Name: XYZ

}

s1_field3 : Completed

sourcetype_2

s2_field1: 123

s2_field2: {

{

CID: 3

Info: XXX

},

{

CID: 2

Info: YYY

},

}

s2_field3: N

Here first i need to match s1_field1 of source1 to s2_field1 of source2.If its matching then I need to match the s2_field1 's CID of source2 with s1_field2's ID of source 1.If matches then need to fetch the all other fields of both source 1 and source 1.

Expecting data like below:

ID:2

Name:ABC

Info: YYY

Please suggest.

yuanliu · ‎12-01-2022

First, thank you for clearly illustrating the data, explain the requirement, and illustrate expected results. This is much clearer than the other thread on the same subject.

If its matching then I need to match the s2_field1 's CID of source2 with s1_field2's ID of source 1

I suspect that you meant "s2_field2 's CID of source2" (and s1_field2's ID of source 1). Still, you left a critical piece of the puzzle unsettled: Under what common field would you want to consider these matches. Is it a time period? Or do you just want values of the rest of fields?

If you just want the values, you can do

sourcetype IN (sourcetype_1, sourcetype_2)
| rename s1_field1 as field1, s2_field1 as field1, s2_field2.CID as CID, s1_field1.ID as CID
| eventstats dc(sourcetype) by field1 CID
| where 'dc(sourcetype)' > 1
| stats values(*) as * by field1 CID

But this is nearly meaningless if you are searching over any extended period of time.

If the match should be considered within a specific time intervals, you need to specify such interval, e.g., 5m.

sourcetype IN (sourcetype_1, sourcetype_2)
| rename s1_field1 as field1, s2_field1 as field1, s2_field2.CID as CID, s1_field1.ID as CID
| bin span=5m _time
| eventstats dc(sourcetype) by field1 CID _time
| where 'dc(sourcetype)' > 1
| stats values(*) as * by field1 CID ``` alternatively, timechart ```

Alternatively,

sourcetype IN (sourcetype_1, sourcetype_2)
| bin span=5m _time
| eventstats values(s1_field1) as s1_field1 values(s2_field1) as s2_field1 values(s2_field2.CID) as CID values(s1_field1.ID) as ID by _time
| where s1_field1 == s2_field1 AND CID == ID
| fields - s2_field1 ID
| stats values(*) as * by s1_field1 CID

Hope this helps.

M28 · ‎12-01-2022

Hi Below query is exactly giving the expected output ,But I am using without the time range.(bin span=5m _time)

what I am observing is its giving me duplicate results.Meaning I am expecting only one entry but after while result set is increasing with same duplicate records.

any suggestions ?

sourcetype IN (sourcetype_1, sourcetype_2)
| bin span=5m _time
| eventstats values(s1_field1) as s1_field1 values(s2_field1) as s2_field1 values(s2_field2.CID) as CID values(s1_field1.ID) as ID by _time
| where s1_field1 == s2_field1 AND CID == ID
| fields - s2_field1 ID
| stats values(*) as * by s1_field1 CID

@yuanliu

yuanliu · ‎12-01-2022

I meant to account for _time buckets in this one but forgot.

sourcetype IN (sourcetype_1, sourcetype_2)
| bin span=5m _time
| eventstats values(s1_field1) as s1_field1 values(s2_field1) as s2_field1 values(s2_field2.CID) as CID values(s1_field1.ID) as ID by _time
| where s1_field1 == s2_field1 AND CID == ID
| fields - s2_field1 ID
| stats values(*) as * by s1_field1 CID _time

In every _time bucket, results are unique. Is that what you wanted? As I mentioned, you have to decide what output is required.

Whenever you bucket time, you need to use it one way or another. But other people do not know what your use case is, so they cannot decide for you. Some other possible uses could be earlies, latest, etc. Take latest for example:

sourcetype IN (sourcetype_1, sourcetype_2)
| bin span=5m _time
| eventstats values(s1_field1) as s1_field1 values(s2_field1) as s2_field1 values(s2_field2.CID) as CID values(s1_field1.ID) as ID by _time
| where s1_field1 == s2_field1 AND CID == ID
| fields - s2_field1 ID
| stats latest(*) as * by s1_field1 CID _time

M28 · ‎12-15-2022

Hi @yuanliu

The query you suggested below working fine in our non prod.But in non prod its taking lot of time to fetch the query or though records are present its giving no result.

Is there performance issue with this query when we are searching across 2 sources types having more traffic in prod .

Any suggestions would be really helpful.

sourcetype IN (sourcetype_1, sourcetype_2)
| bin span=5m _time
| eventstats values(s1_field1) as s1_field1 values(s2_field1) as s2_field1 values(s2_field2.CID) as CID values(s1_field1.ID) as ID by _time
| where s1_field1 == s2_field1 AND CID == ID
| fields - s2_field1 ID
| stats latest(*) as * by s1_field1 CID _time

yuanliu · ‎12-15-2022

Performance bottleneck mainly comes from eventstats, not because of multiple sources. With intimate knowledge of actual data and fields needed, you can construct search strategy using stats instead of eventstats. But this is something outsiders cannot help too much. Another performance hindrance is latest(*) as * if there are too many fields. Again, with knowledge about what you really need, you can spell them out so this is a smaller problem.

M28 · ‎12-02-2022

Thank you for your response. @yuanliu

what I meant to say was I am using below query by giving time range option in splunk option

sourcetype IN (sourcetype_1, sourcetype_2)
| eventstats values(s1_field1) as s1_field1 values(s2_field1) as s2_field1 values(s2_field2.CID) as CID values(s1_field1.ID) as ID by _time
| where s1_field1 == s2_field1 AND CID == ID
| fields - s2_field1 ID
| stats values(*) as * by s1_field1 CID

I am seeing strange behaviour initially query seems giving expected output.after a while when I again when submit again and its not giving any output at all.It says no results found though there are records (events) available.Query sometimes fetches the results and sometimes it won't.

any suggestions on what I might be missing ?

yuanliu · ‎12-03-2022

I am seeing strange behaviour initially query seems giving expected output.after a while when I again when submit again and its not giving any output at all.It says no results found though there are records (events) available.Query sometimes fetches the results and sometimes it won't.

The problem is by _time in eventstats.

| eventstats values(s1_field1) as s1_field1 values(s2_field1) as s2_field1
 values(s2_field2.CID) as CID values(s1_field1.ID) as ID by _time
| where s1_field1 == s2_field1 AND CID == ID

If you do NOT bucket _time, you only get results when there are matching events (from two independent sources, indexed separately) that accidentally happened during the resolution limit of _time. Depending on resolution of the sources, that could mean an interval of 1s, 1ms, or even 1µs. Such occurrences are extremely rare.

The moral of this story is: even if you do not explicitly bucket _time, any stats by _time is still bucketed; it's just that the bucket is not under your control. As I mentioned in the previous comment, whenever you bucket _time, you must use it in some way. The last command cannot be just | stats values(*) as * by s1_field1 CID.

M28 · ‎12-05-2022

Thank you for clarifying.It was helpful and it worked for me.

Just one clarification ,with this query have created dashboard and added drop down filtering options for few fields from both the source types.its working for the common fields name ,but field which is present in any of the source is not having ,though record is present its saying no result found.

filter searching is not working for s1_field3 and s2_field3 .Is it something filtering works only for common fields across both the sources ?

yuanliu · ‎12-05-2022

Can you explain what do you mean by filter not working with s1_field3 and s2_field3? The prior strategy was to match s1_field1 with s2_field1, s2_field2.CID with s1_field1.ID:

| eventstats values(s1_field1) as s1_field1 values(s2_field1) as s2_field1
 values(s2_field2.CID) as CID values(s1_field1.ID) as ID by _time

There is no effort to match field3 at all. Am I missing something?

M28 · ‎12-06-2022

Hi ..

Query and initial ask is same ,No change to it .

what i meant by filtering is that adding dropdown in dashboard as below for field3.

so in order to have this working value should be present in both the sources ,otherwise it wont work ? is the understanding right ?

yuanliu · ‎12-06-2022

Let me step back a bit and establish a common understanding. By "filtering", you are referring to the where command following eventstats, is this correct? Whether there is any return after where really depends on how field3 is being used. Can you explain? I assume that your dropdown sets a token with certain values. What is the token name, how do you populate those values, and how is the token used in the where command, for example?

M28 · ‎12-06-2022

Hi ..

No I am not referring where command and eventstats conditions from the query.

After the query output i am getting all the fields of both the sources(as giving | stats values(*) as ..) .Including field3.

I am giving the token name as field3 and also in the query passing as $field3$.but its not working as its present only in any one of the sources.

what I am assuming is to have this dropdown option in dashboard value should be present in both the sources and then only it filters from the dashboard .

is the understanding correct ?

yuanliu · ‎12-06-2022

I'm afraid not. If the following

| stats values(*) as * by s1_field1 CID

includes field3 in the output (equivalent to values(field3) as field3), you can use where or search command to filter according to its values. | stats values(fieldname) as fieldname is often used to produce a field called "fieldname" when its presence is not ubiquitous.

Can you illustrate the actual filter? What is the output? Why the output is not what you wanted? (What is the expected result?) In other words, describe the problem without using "not working".

If the problem is that when you do

| where field3 == "some value"

only return some combination of {s1_field1, CID}, that is the filter "working" as the command asks.

M28 · ‎12-07-2022

Thank you for the response.

your all inputs were really helpful.

M28 · ‎12-01-2022

Thank you for the detailed information and multiple options.

I will be creating dashboard with this use-case and having filter to select the timerange in dropdown.

so in the query would be fetching based on values matches against 2 sources.

I will try these queries and keep you posted.

Thank you again

How to format when fields match between multiple sources and loop through items?

fields

join

stats

subsearch

transaction

Brains, Bytes, and Boston: Learn from the Best at .conf25

Splunk AppDynamics Agents Webinar Series

SplunkTrust Application Period is Officially OPEN!

Are you a member of the Splunk Community?

How to format when fields match between multiple sources and loop through items?