Solved: How to combine two searches with common value into...

gballanti · ‎12-05-2019

I need help regarding a join from events based on different sourcetype (same index) that are related by the same value in different fields.
The logical flow starts from a bar char that group/count similar fields. With drill down I pass the 'description' by a token to the search that has to combine the search into a table.
Below a simple example:

sourcetype_A
s1_field1 = Purchase OK
s1_field2 = 9
s1_field3  = tax value
s1_field4 = Completed

sourcetype_B
s2_field1 = 9
s2_field2 = Rome
s2_field3 = Fontana di Trevi

I need to obtain a table with fields "s1_field2, s1_field1,s2_field2,s2_field3"

I'm working around this kind of search without obtaining my outcome.

sourcetype="Sourcetype_A" s1_field4="Completed" | search s1_field1=$from_token$
| join s1_field2
    [ search sourcetype="Sourcetype_B" 
    | rename s2_field1 as s1_field2 ] 
| table s1_field2, s1_field1,s2_field2,s2_field3

Thank in advance for your help.

dmarling · ‎12-05-2019

You'll see this answer a lot on here. You should avoid joins except for some very specific use cases. Joins are inefficient and can cause truncation of your data results, since it has a default limitation of only running for 30 seconds and then auto finalizing. I suggest doing this instead:

(sourcetype="Sourcetype_A" s1_field4="Completed") OR sourcetype="Sourcetype_B" 
| eval joiner=if(sourcetype="Sourcetype_A", s1_field2, s2_field1)
| stats values(*) as * by joiner
| where s1_field2=s2_field1

Season the above query to taste by only putting the fields you want in the third line. What this is doing is pulling in both data sets and joining them together with a stats command. This is won't run into those timing limitations that a join would and be more performant.

I edited the answer so it only returns results where those two fields match.

If this comment/answer was helpful, please up vote it. Thank you.

View solution in original post

gballanti · ‎12-06-2019

thanks to everyone , the solution proposed works as expected.

to4kawa · ‎12-06-2019

(sourcetype="snow:task_sla" dv_has_breached=false dv_stage=Completed dv_sla="FW-Guasto non bloccante fuori sede no A-SLA21") OR sourcetype="snow:incident" 
| eval joiner=if(sourcetype="snow:task_sla", dv_task, dv_number) 
| stats values(dv_task) as dv_task values(dv_number) as dv_number values(dv_sla) as dv_sla by joiner
| where dv_task==dv_number

Ask, and it shall be given you.

gballanti · ‎12-06-2019

probably my previous comment is gone ... I wanted to thank you for the answers

gballanti · ‎12-06-2019

just an update ...
the following query

(sourcetype="snow:task_sla" dv_has_breached=false dv_stage=Completed dv_sla="FW-Guasto non bloccante fuori sede no A-SLA21") OR sourcetype="snow:incident" | eval joiner=if(sourcetype="snow:task_sla", dv_task, dv_number) | stats values(dv_task) as dv_task values(dv_number) as dv_number values(dv_sla) as dv_sla by joiner

return an unexpected behavior

I have to get only the highlighted rows, where dv_task and dv_number matches.

dmarling · ‎12-06-2019

Yep, this is an easy fix. Modify your query by adding a filter at the end to only include when they match:

(sourcetype="snow:task_sla" dv_has_breached=false dv_stage=Completed dv_sla="FW-Guasto non bloccante fuori sede no A-SLA21") OR sourcetype="snow:incident" 
| eval joiner=if(sourcetype="snow:task_sla", dv_task, dv_number) 
| stats values(dv_task) as dv_task values(dv_number) as dv_number values(dv_sla) as dv_sla by joiner
| where dv_task=dv_number

If this comment/answer was helpful, please up vote it. Thank you.

gballanti · ‎12-06-2019

Thanks to both of you for the prompt reply.

I used the Dustin's suggestion because is more clear for my little knowledge, later will try the "selfjoin" also.
To filter the columns I used table command as last row

(sourcetype="Sourcetype_A" s1_field4="Completed") OR sourcetype="Sourcetype_B"
| eval joiner=if(sourcetype="Sourcetype_A", s1_field2, s2_field1)
| stats values(*) as * by joiner
| table s1_field2, s1_field1,s2_field2,s2_field3

is it a good way to shows the fields ?

dmarling · ‎12-05-2019

You'll see this answer a lot on here. You should avoid joins except for some very specific use cases. Joins are inefficient and can cause truncation of your data results, since it has a default limitation of only running for 30 seconds and then auto finalizing. I suggest doing this instead:

(sourcetype="Sourcetype_A" s1_field4="Completed") OR sourcetype="Sourcetype_B" 
| eval joiner=if(sourcetype="Sourcetype_A", s1_field2, s2_field1)
| stats values(*) as * by joiner
| where s1_field2=s2_field1

Season the above query to taste by only putting the fields you want in the third line. What this is doing is pulling in both data sets and joining them together with a stats command. This is won't run into those timing limitations that a join would and be more performant.

I edited the answer so it only returns results where those two fields match.

If this comment/answer was helpful, please up vote it. Thank you.

kp_pl · ‎06-28-2024

@dmarling pretty good explanation ! .
But now I need to go one step further . Based on result of stats commands I want to create timechart. First I tried to replace stats with timechart but it simply does not work. Then I created table from stats result and base on this table wanted to create a time chart. But I rather feel than know it is not a good way. Would you give any example how to do it please. Even based on this simple from source post

dmarling · ‎06-28-2024

You just need the _time field that isn't multivalued to be present for timechart to work. Assuming you are working with the example you replied to you would simply add "| eval _time=min(_time) | timechart count" after the where line. The eval is ensuring you take the earliest time if it is multi-valued.

If this comment/answer was helpful, please up vote it. Thank you.

kp_pl · ‎06-28-2024

something is not clear because it does not return any rows now:

(...)

| stats values(*) as * by joiner | where ctx_ecid=ecid_d
| eval _time=min(_time) | timechart span=5min avg(time_taken)

without the last line with timechart it returns all expected rows, with it - none 😞

and one more question . Right now I do stats(*) as * by joiner but this returns all fields from both indexes. I suppose it cost a lot to transfer such big amount of data especially each my index has 10-20 fields and thousands records. In fact I use only 2-3 fields from each index so I tried to something like :

| stats values(field1) as field1, values(field2) as field2, values(field3) as field3 by joiner (...)

but it does not return rows 😞 Why ? how to modify it to returns only fields I need?

Anji02 · ‎05-24-2021

Hi,
I tried in same way, but not working

index=my_index type=outer OR type=inner
|rename msg.* AS *
| eval joiner=if(type=outer, _id, source_id)
| stats values(*) as * by joiner
| where ticket_num=TicketNum

|table event_id, serial_id

Can you please help me here @dmarling

dmarling · ‎05-24-2021

The joiner eval doesn't have double quotes around "outer" so it's trying to say when the value in field type equals the value in field outer, then use the value in field _id. What you want to do is wrap outer with double quotes so the eval command knows you are looking for the field type to have a value of "outer."

index=my_index (type=outer OR type=inner)
| rename msg.* AS *
| eval joiner=if(type="outer", _id, source_id)
| stats values(*) as * by joiner
| where ticket_num=TicketNum

If this comment/answer was helpful, please up vote it. Thank you.

woodcock · ‎12-05-2019

You need selfjoin, like this:

|makeresults
| eval raw="sourcetype=sourcetype_A,s1_field1=PurchaseOK,s1_field2=9,s1_field3=taxvalue,s1_field4=Completed sourcetype=sourcetype_B,s2_field1=9,s2_field2=Rome,s2_field3=FontanadiTrevi"
| makemv raw
| mvexpand raw
| rename raw AS _raw
| kv

| rename COMMENT AS "Everything above generates sample event data; everything below is your solution"

| eval joiner=coalesce(s1_field1, s2_field1)
| selfjoin joiner

How to combine two searches with common value into one table

Monitoring MariaDB and MySQL

Financial Services Industry Use Cases, ITSI Best Practices, and More New Articles ...

Splunk Federated Analytics for Amazon Security Lake