Hi,
I was wondering how to correlate data using different sources.
For example:
Source A contains:
User ID = 123
Source B contains
User ID =123
User email = user@user
I want to find the user related to the UserID 123 (which comes up after my search). I want to do this by getting the User emal from Source B. My search runs in Source A since there are some fields I need from there.
Hi @vstan,
ok, please try this:
index="ABC"
(sourcetype="SourceA" OR sourcetype="SourceB")
| eval User=coalesce(user,User)
| stats
values(TOTAL_ATTACHMENT_SIZE_SEGMENT) AS Total_Bytes_Size
values(EMAIL_ADDRESS) AS EMAIL_ADDRESS
BY User
| sort - Total_Bytes_Size
the error was for the space after sum.
Ciao.
Giuseppe
Hi @vstan ,
you have to use stats command BY the correlation key, something like this:
index=indexA OR index=indexB
| stats
values(field1) AS field1
values(field2) AS field2
values(field3) AS field3
values(User_email) As User_email
BY User_ID
you can add all the fields you need in the stats command.
Anyway, don't use join command!
Ciao.
Giuseppe
Hi @gcusello
I can't seem to make it work. I'll try to explain more detailed.
I have two sources (sourcetypes basically).
Source A
User ID
Total bytes send
Source B
User ID
User Email address
I want to calculate the total bytes send per User ID (For this I need Source A). Now Have I done this and the results generate an ID with the amount of Bytes send for each user.
The next step for me is to put the User Email Address next to the User ID. This value is only known in Source B. So I need to correlate them together so that I can tell Splunk hey look up this UserID and give me User Email Address back in a table next to it.
I currently have the following:
index="ABC"
source="SourceA" OR source="SourceB"
| eval Total_Bytes_Size=TOTAL_ATTACHMENT_SIZE_SEGMENT
| eval user=User
| dedup User
| where NOT Total_Bytes_Size="TOTAL_ATTACHMENT_SIZE_SEGMENT"
| where NOT Total_Bytes_Size="0MB"
| table User, Total_Bytes_Size
| sort - Total_Bytes_Size
This gives me a table with The UserID and the total Bytes size sent. Now I only need to add a row with the corresponding EMAIL_ADDRESS from source B.
Hope this clears it up a bit.
Hi @vstan,
are you sure that you don't need to sum the bytes?
please try this:
index="ABC"
(sourcetype="SourceA" OR sourcetype="SourceB")
| eval User=coalesce(user,User)
| stats
sum (TOTAL_ATTACHMENT_SIZE_SEGMENT) AS Total_Bytes_Size
values(EMAIL_ADDRESS) AS EMAIL_ADDRESS
BY User
| sort - Total_Bytes_Size
Ciao.
Giuseppe
Hi @gcusello
My data is already 'summed' ->
This is how it is stored.
TOTAL_ATTACHMENT_SIZE_SEGMENT 5-25MB
When I try to run the query you provided it gives me an error: Error in 'stats' command: The argument '(TOTAL_ATTACHMENT_SIZE_SEGMENT)' is invalid.
Hi @vstan,
ok, please try this:
index="ABC"
(sourcetype="SourceA" OR sourcetype="SourceB")
| eval User=coalesce(user,User)
| stats
values(TOTAL_ATTACHMENT_SIZE_SEGMENT) AS Total_Bytes_Size
values(EMAIL_ADDRESS) AS EMAIL_ADDRESS
BY User
| sort - Total_Bytes_Size
the error was for the space after sum.
Ciao.
Giuseppe
Unfortunately I get 0 results...
Hi @vstan ,
check if in all events you have the User field (fields are case sensitive!), if not add in the coalesce command all the fields containing the User values to use as correlation key.
Then check the exact field name of TOTAL_ATTACHMENT_SIZE_SEGMENT and EMAIL_ADDRESS.
Ciao.
Giuseppe
P.S.: Karma Points are appreciated 😉
Hi @gcusello
I used your solution and it worked. I now only have to fix the bytes as they don't show up, but I will try to solve it myself :D. Thanks!