Splunk Search

Merge fields from distinct sources on event

Path Finder

Hello,

I'm trying to find the best way to do the following:
Source A: id_field1 id_field2 fielda1 fielda2
Source B: id_field1 id_field2 fieldb1
Source C: id_field1 fieldc1

I want to retrieve fieldb1 and fieldc1 from source B and C respectively and still be able to search fields from source A.
Fieldb1 and fieldc1 can store distinct values for the same id_field.

Example:
time idfield1 id_field2 fielda1 fielda2 fieldb1 fieldc1
xx-yy-zzzz 111 444 FA1 FA2 AAA CCC
xx-yy-zzzz 111 444 FA1 FA2 BBB CCC
xx-yy-zzzz 111 444 FA1 FA2 AAA ZZZ

I tried several approaches:
- Used eventstats to add both fieldb1 and fieldc1 to events, which is a slow method.
- Used the trick eval id__{source}=id_field1 | stats values(id__*) as id__* (...) by id_field1 | where id__sourcea=id__sourceb (...) so far this seems to be the best approach but by applying the values to fieldb1 and fieldc1 I'll sometimes have a multivalue field which will lead to use the mvexpand command and make the query even slower. Another issue is that applying the values() will sort the multivalue field alphabetically and I'll loose track of the connection between id_field2 and fieldb1.

Using an automatic lookup won't fit due to the huge number of distinct id's.
I'm also looking at the append function but due to its 50k events limitation I think it won't work.

Could you provide some guidance on how to do this?

Thanks in advance.

0 Karma

Motivator

Hey LordLeet,

Are you trying something like this:

source=sourceA OR source=sourceB OR source=sourceC | stats values(fielda1) AS fielda1 values(fielda2) AS fielda2 values(fieldb1) AS fieldb1 values(fieldc1) AS fieldc1 values(idfield1) AS idfield1 values(idfield2) AS idfield1 by _time

Let me know if this helps!!

0 Karma

Motivator

Could you please try this query and let us know

source="Source A" | table idfield1 idfield2 fielda1 fielda2 | append [ search source="Source B" | table fieldb1 ] | append [ search source="Source C" | table fieldc1 ] | table idfield1 idfield2 fielda1 fielda2 fieldb1 fieldb1

0 Karma

Path Finder

Hello logloganathan,

Unfortunately append won't solve it since it truncates the results to 50k events and I have much more than that. Due to performance issues I think I shouldn't mess with this value on the configuration files.

0 Karma

Motivator

Could you please provide your full query so that i can modify and provide to you

0 Karma

Path Finder

Lets say source A is the main data source and source B and C only store some metadata.
Source A stores the user and the itemid, Source B stores the itemid and itemshape and Source C stores the itemid and the item_color.

I only need to access B and C to retrieve those fields and then do other operations or aggregations on the main source based on the item color and shape.

This is an example

index=idx (source=sourceB OR source=sourceC)
| chart dc(itemid) as items by user color
| rename 1 as Red 2 as Green 4 as Blue
| eval Purple=Red+Blue
| eval item
specs=case(Green!=0 AND Purple!=0,"Green & Purple",Green!=0 AND Purple=0,"Green Only",Purple>1 AND Green=0,"Purple Only",NULL=0 AND Purple=0 AND Green=0, "Uncolored")
| stats count by hh_type

Green & Purple 6537
Green Only 1755
Uncolored 419669
Purple Only 36577

This works as intended if I just wanted to do the aggregation.
I'm having trouble on how to proceed if I wanted to use more fields than the ones mentioned above since the chart wouldn't work.

0 Karma