Splunk Search

How to correlate data from different sources

vstan
Explorer

Hi, 

 

I was wondering how to correlate data using different sources. 

 

For example: 

 

Source A contains: 

User ID = 123

 

Source B contains

User ID =123 

User email = user@user

 

I want to find the user related to the UserID 123 (which comes up after my search). I want to do this by getting the User emal from Source B.  My search runs in Source A since there are some fields I need from there. 

Labels (1)
0 Karma
1 Solution

gcusello
SplunkTrust
SplunkTrust

Hi @vstan,

ok, please try this:

index="ABC"
(sourcetype="SourceA" OR sourcetype="SourceB")
| eval User=coalesce(user,User)
| stats 
     values(TOTAL_ATTACHMENT_SIZE_SEGMENT) AS Total_Bytes_Size 
     values(EMAIL_ADDRESS) AS EMAIL_ADDRESS 
     BY User
| sort - Total_Bytes_Size

the error was for the space after sum.

Ciao.

Giuseppe

View solution in original post

gcusello
SplunkTrust
SplunkTrust

Hi @vstan ,

you have to use stats command BY the correlation key, something like this:

index=indexA OR index=indexB
| stats 
     values(field1) AS field1
     values(field2) AS field2
     values(field3) AS field3
     values(User_email) As User_email
     BY User_ID

you can add all the fields you need in the stats command.

Anyway, don't use join command!

Ciao.

Giuseppe

0 Karma

vstan
Explorer

Hi @gcusello 

I can't seem to make it work. I'll try to explain more detailed. 

 

I have two sources (sourcetypes basically). 

Source A 

User ID 

Total bytes send

 

Source B 

User ID

User Email address 

 

I want to calculate the total bytes send per User ID (For this I need Source A). Now Have I done this and the results generate an ID with the amount of Bytes send for each user. 

 

The next step for me is to put the User Email Address next to the User ID.  This value is only known in Source B. So I need to correlate them together so that I can tell Splunk hey look up this UserID and give me User Email Address back in a table next to it. 

I currently have the following: 

index="ABC"
source="SourceA" OR source="SourceB"
| eval Total_Bytes_Size=TOTAL_ATTACHMENT_SIZE_SEGMENT
| eval user=User
| dedup User
| where NOT Total_Bytes_Size="TOTAL_ATTACHMENT_SIZE_SEGMENT"
| where NOT Total_Bytes_Size="0MB"
| table User,  Total_Bytes_Size
| sort - Total_Bytes_Size



This gives me a table with The UserID and the total Bytes size sent. Now I only need to add a row with the corresponding EMAIL_ADDRESS from source B.

 

Hope this clears it up a bit. 

 

 

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @vstan,

are you sure that you don't need to sum the bytes?

please try this:

index="ABC"
(sourcetype="SourceA" OR sourcetype="SourceB")
| eval User=coalesce(user,User)
| stats 
     sum (TOTAL_ATTACHMENT_SIZE_SEGMENT) AS Total_Bytes_Size 
     values(EMAIL_ADDRESS) AS EMAIL_ADDRESS 
     BY User
| sort - Total_Bytes_Size

Ciao.

Giuseppe

 

0 Karma

vstan
Explorer

Hi @gcusello 

 

My data is already 'summed' -> 

This is how it is stored. 

TOTAL_ATTACHMENT_SIZE_SEGMENT 5-25MB

 

When I try to run the query you provided it gives me an error: Error in 'stats' command: The argument '(TOTAL_ATTACHMENT_SIZE_SEGMENT)' is invalid.

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @vstan,

ok, please try this:

index="ABC"
(sourcetype="SourceA" OR sourcetype="SourceB")
| eval User=coalesce(user,User)
| stats 
     values(TOTAL_ATTACHMENT_SIZE_SEGMENT) AS Total_Bytes_Size 
     values(EMAIL_ADDRESS) AS EMAIL_ADDRESS 
     BY User
| sort - Total_Bytes_Size

the error was for the space after sum.

Ciao.

Giuseppe

vstan
Explorer

Unfortunately I get 0 results...

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @vstan ,

check if in all events you have the User field (fields are case sensitive!), if not add in the coalesce command all the fields containing the User values to use as correlation key.

Then check the exact field name of TOTAL_ATTACHMENT_SIZE_SEGMENT and EMAIL_ADDRESS.

Ciao.

Giuseppe

P.S.: Karma Points are appreciated 😉

0 Karma

vstan
Explorer

Hi @gcusello 

 

I used your solution and it worked. I now only have to fix the bytes as they don't show up, but I will try to solve  it myself :D. Thanks! 

0 Karma
Get Updates on the Splunk Community!

Automatic Discovery Part 1: What is Automatic Discovery in Splunk Observability Cloud ...

If you’ve ever deployed a new database cluster, spun up a caching layer, or added a load balancer, you know it ...

Real-Time Fraud Detection: How Splunk Dashboards Protect Financial Institutions

Financial fraud isn't slowing down. If anything, it's getting more sophisticated. Account takeovers, credit ...

Splunk + ThousandEyes: Correlate frontend, app, and network data to troubleshoot ...

 Are you tired of troubleshooting delays caused by siloed frontend, application, and network data? We've got a ...