Merge fields from distinct sources on event

LordLeet · ‎04-12-2018

Hello,

I'm trying to find the best way to do the following:
Source A: id_field1 id_field2 fielda1 fielda2
Source B: id_field1 id_field2 fieldb1
Source C: id_field1 fieldc1

I want to retrieve fieldb1 and fieldc1 from source B and C respectively and still be able to search fields from source A.
Fieldb1 and fieldc1 can store distinct values for the same id_field.

Example:
_time id_field1 id_field2 fielda1 fielda2 fieldb1 fieldc1
xx-yy-zzzz 111 444 FA1 FA2 AAA CCC
xx-yy-zzzz 111 444 FA1 FA2 BBB CCC
xx-yy-zzzz 111 444 FA1 FA2 AAA ZZZ

I tried several approaches:
- Used eventstats to add both fieldb1 and fieldc1 to events, which is a slow method.
- Used the trick eval id__{source}=id_field1 | stats values(id__*) as id__* (...) by id_field1 | where id__sourcea=id__sourceb (...) so far this seems to be the best approach but by applying the values to fieldb1 and fieldc1 I'll sometimes have a multivalue field which will lead to use the mvexpand command and make the query even slower. Another issue is that applying the values() will sort the multivalue field alphabetically and I'll loose track of the connection between id_field2 and fieldb1.

Using an automatic lookup won't fit due to the huge number of distinct id's.
I'm also looking at the append function but due to its 50k events limitation I think it won't work.

Could you provide some guidance on how to do this?

Thanks in advance.

deepashri_123 · ‎04-17-2018

Hey LordLeet,

Are you trying something like this:

source=sourceA OR source=sourceB OR source=sourceC | stats values(fielda1) AS fielda1 values(fielda2) AS fielda2 values(fieldb1) AS fieldb1 values(fieldc1) AS fieldc1 values(id_field1) AS id_field1 values(id_field2) AS id_field1 by _time

Let me know if this helps!!

logloganathan · ‎04-12-2018

Could you please try this query and let us know

LordLeet · ‎04-12-2018

Hello logloganathan,

Unfortunately append won't solve it since it truncates the results to 50k events and I have much more than that. Due to performance issues I think I shouldn't mess with this value on the configuration files.

logloganathan · ‎04-12-2018

Could you please provide your full query so that i can modify and provide to you

LordLeet · ‎04-12-2018

Lets say source A is the main data source and source B and C only store some metadata.
Source A stores the user and the item_id, Source B stores the item_id and item_shape and Source C stores the item_id and the item_color.

I only need to access B and C to retrieve those fields and then do other operations or aggregations on the main source based on the item color and shape.

This is an example

index=idx (source=sourceB OR source=sourceC)
| chart dc(item_id) as items by user color
| rename 1 as Red 2 as Green 4 as Blue
| eval Purple=Red+Blue
| eval item_specs=case(Green!=0 AND Purple!=0,"Green & Purple",Green!=0 AND Purple=0,"Green Only",Purple>1 AND Green=0,"Purple Only",NULL=0 AND Purple=0 AND Green=0, "Uncolored")
| stats count by hh_type

Green & Purple 6537
Green Only 1755
Uncolored 419669
Purple Only 36577

This works as intended if I just wanted to do the aggregation.
I'm having trouble on how to proceed if I wanted to use more fields than the ones mentioned above since the chart wouldn't work.

Merge fields from distinct sources on event

Accelerating Observability as Code with the Splunk AI Assistant

Integrating Splunk Search API and Quarto to Create Reproducible Investigation ...

Congratulations to the 2025-2026 SplunkTrust!

Join the Conversation

Merge fields from distinct sources on event

Accelerating Observability as Code with the Splunk AI Assistant

Integrating Splunk Search API and Quarto to Create Reproducible Investigation ...

Congratulations to the 2025-2026 SplunkTrust!