Splunk Search

How to combine two searches with common value into one table

gballanti
Explorer

I need help regarding a join from events based on different sourcetype (same index) that are related by the same value in different fields.
The logical flow starts from a bar char that group/count similar fields. With drill down I pass the 'description' by a token to the search that has to combine the search into a table.
Below a simple example:

sourcetype_A
s1_field1 = Purchase OK
s1_field2 = 9
s1_field3  = tax value
s1_field4 = Completed

sourcetype_B
s2_field1 = 9
s2_field2 = Rome
s2_field3 = Fontana di Trevi

I need to obtain a table with fields "s1_field2, s1_field1,s2_field2,s2_field3"

I'm working around this kind of search without obtaining my outcome.

sourcetype="Sourcetype_A" s1_field4="Completed" | search s1_field1=$from_token$
| join s1_field2
    [ search sourcetype="Sourcetype_B" 
    | rename s2_field1 as s1_field2 ] 
| table s1_field2, s1_field1,s2_field2,s2_field3

Thank in advance for your help.

1 Solution

dmarling
Builder

You'll see this answer a lot on here. You should avoid joins except for some very specific use cases. Joins are inefficient and can cause truncation of your data results, since it has a default limitation of only running for 30 seconds and then auto finalizing. I suggest doing this instead:

(sourcetype="Sourcetype_A" s1_field4="Completed") OR sourcetype="Sourcetype_B" 
| eval joiner=if(sourcetype="Sourcetype_A", s1_field2, s2_field1)
| stats values(*) as * by joiner
| where s1_field2=s2_field1

Season the above query to taste by only putting the fields you want in the third line. What this is doing is pulling in both data sets and joining them together with a stats command. This is won't run into those timing limitations that a join would and be more performant.

I edited the answer so it only returns results where those two fields match.

If this comment/answer was helpful, please up vote it. Thank you.

View solution in original post

gballanti
Explorer

thanks to everyone , the solution proposed works as expected.

0 Karma

to4kawa
Ultra Champion
(sourcetype="snow:task_sla" dv_has_breached=false dv_stage=Completed dv_sla="FW-Guasto non bloccante fuori sede no A-SLA21") OR sourcetype="snow:incident" 
| eval joiner=if(sourcetype="snow:task_sla", dv_task, dv_number) 
| stats values(dv_task) as dv_task values(dv_number) as dv_number values(dv_sla) as dv_sla by joiner
| where dv_task==dv_number

Ask, and it shall be given you.

0 Karma

gballanti
Explorer

probably my previous comment is gone ... I wanted to thank you for the answers

0 Karma

gballanti
Explorer

just an update ...
the following query

(sourcetype="snow:task_sla" dv_has_breached=false dv_stage=Completed dv_sla="FW-Guasto non bloccante fuori sede no A-SLA21") OR sourcetype="snow:incident" | eval joiner=if(sourcetype="snow:task_sla", dv_task, dv_number) | stats values(dv_task) as dv_task values(dv_number) as dv_number values(dv_sla) as dv_sla by joiner

return an unexpected behavior

alt text

I have to get only the highlighted rows, where dv_task and dv_number matches.

0 Karma

dmarling
Builder

Yep, this is an easy fix. Modify your query by adding a filter at the end to only include when they match:

(sourcetype="snow:task_sla" dv_has_breached=false dv_stage=Completed dv_sla="FW-Guasto non bloccante fuori sede no A-SLA21") OR sourcetype="snow:incident" 
| eval joiner=if(sourcetype="snow:task_sla", dv_task, dv_number) 
| stats values(dv_task) as dv_task values(dv_number) as dv_number values(dv_sla) as dv_sla by joiner
| where dv_task=dv_number
If this comment/answer was helpful, please up vote it. Thank you.
0 Karma

gballanti
Explorer

Thanks to both of you for the prompt reply.

I used the Dustin's suggestion because is more clear for my little knowledge, later will try the "selfjoin" also.
To filter the columns I used table command as last row

(sourcetype="Sourcetype_A" s1_field4="Completed") OR sourcetype="Sourcetype_B"
| eval joiner=if(sourcetype="Sourcetype_A", s1_field2, s2_field1)
| stats values(*) as * by joiner
| table s1_field2, s1_field1,s2_field2,s2_field3

is it a good way to shows the fields ?

0 Karma

dmarling
Builder

You'll see this answer a lot on here. You should avoid joins except for some very specific use cases. Joins are inefficient and can cause truncation of your data results, since it has a default limitation of only running for 30 seconds and then auto finalizing. I suggest doing this instead:

(sourcetype="Sourcetype_A" s1_field4="Completed") OR sourcetype="Sourcetype_B" 
| eval joiner=if(sourcetype="Sourcetype_A", s1_field2, s2_field1)
| stats values(*) as * by joiner
| where s1_field2=s2_field1

Season the above query to taste by only putting the fields you want in the third line. What this is doing is pulling in both data sets and joining them together with a stats command. This is won't run into those timing limitations that a join would and be more performant.

I edited the answer so it only returns results where those two fields match.

If this comment/answer was helpful, please up vote it. Thank you.

kp_pl
Path Finder

@dmarling   pretty good explanation ! . 
But now I need to go one step further . Based on result of stats commands I want to create timechart.   First I tried to replace stats with timechart but it simply does not work.  Then I created table from stats result and base on this table wanted to create a time chart. But I rather feel than know  it is not a good way.  Would you give any example how to do it please. Even based on this simple from source post

0 Karma

dmarling
Builder

You just need the _time field that isn't multivalued to be present for timechart to work.  Assuming you are working with the example you replied to you would simply add "| eval _time=min(_time) | timechart count" after the where line.  The eval is ensuring you take the earliest time if it is multi-valued.

If this comment/answer was helpful, please up vote it. Thank you.
0 Karma

kp_pl
Path Finder

something is not clear because it does not return any rows now:

(...)

| stats values(*) as * by joiner | where ctx_ecid=ecid_d
| eval _time=min(_time) | timechart span=5min avg(time_taken)



without the last line with timechart it returns all expected rows, with it - none 😞

 

 

and one more question . Right now I do stats(*) as * by joiner  but this returns all fields from both indexes. I suppose it cost a lot to transfer such big amount of data especially each my index has 10-20 fields and thousands records. In fact I use only 2-3 fields from each index so I tried to something like :

| stats values(field1) as field1, values(field2) as field2, values(field3) as field3 by joiner (...)



but it does not return rows 😞   Why ?   how  to modify it to returns only fields I need?

 

0 Karma

Anji02
Observer

Hi,
I tried in same way, but not working

index=my_index type=outer OR type=inner
|rename msg.* AS *
| eval joiner=if(type=outer, _id, source_id)
| stats values(*) as * by joiner
| where ticket_num=TicketNum

|table event_id, serial_id

Can you please help me here @dmarling 

0 Karma

dmarling
Builder

The joiner eval doesn't have double quotes around "outer" so it's trying to say when the value in field type equals the value in field outer, then use the value in field _id.  What you want to do is wrap outer with double quotes so the eval command knows you are looking for the field type to have a value of "outer."

index=my_index (type=outer OR type=inner)
| rename msg.* AS *
| eval joiner=if(type="outer", _id, source_id)
| stats values(*) as * by joiner
| where ticket_num=TicketNum
If this comment/answer was helpful, please up vote it. Thank you.
0 Karma

woodcock
Esteemed Legend

You need selfjoin, like this:

|makeresults
| eval raw="sourcetype=sourcetype_A,s1_field1=PurchaseOK,s1_field2=9,s1_field3=taxvalue,s1_field4=Completed sourcetype=sourcetype_B,s2_field1=9,s2_field2=Rome,s2_field3=FontanadiTrevi"
| makemv raw
| mvexpand raw
| rename raw AS _raw
| kv

| rename COMMENT AS "Everything above generates sample event data; everything below is your solution"

| eval joiner=coalesce(s1_field1, s2_field1)
| selfjoin joiner
0 Karma
Get Updates on the Splunk Community!

Monitoring MariaDB and MySQL

In a previous post, we explored monitoring PostgreSQL and general best practices around which metrics to ...

Financial Services Industry Use Cases, ITSI Best Practices, and More New Articles ...

Splunk Lantern is a Splunk customer success center that provides advice from Splunk experts on valuable data ...

Splunk Federated Analytics for Amazon Security Lake

Thursday, November 21, 2024  |  11AM PT / 2PM ET Register Now Join our session to see the technical ...