Solved: Re: Left join - find missing data from second inde...

mattiasrs · ‎02-05-2021

Hello, I am quite new to Splunk and this is my first post. Hoping that I can get some help from this awesome community.

I have two systems, System A and System B. System A receives customer information which is then sent to System B . The data in both systems have the exact same fields and a unique Customer ID with the same name in both systems.

I want to create a dashboard where I can select a time period and see only problematic customers that only exist in System A, meaning they haven't been sent to System B for some reason.

This is my search to see all the data:

index=systemA OR index=systemB

| fields customer_ID, systemA_Timestamp, systemB_Timestamp

| stats values(*) as * by customer_ID

| table customer_ID, systemA_Timestamp, systemB_Timestamp

So to summarize, I want to see customer_IDs that only exist in System A. I am not sure which function to use here. I have been experimenting with isnull(systemB_Timestamp) with no success. Join is not an option as the limit of 50 000 might be a problem.

Would be very grateful for any help!

ITWhisperer · ‎02-08-2021

Try something like this

index=systemA OR index=systemB
| fields customer_ID, systemA_Timestamp, systemB_Timestamp
| stats values(*) as * by customer_ID
| fillnull value="NotPresent" systemB_Timestamp
| where systemB_Timestamp="NotPresent"
| table customer_ID, systemA_Timestamp, systemB_Timestamp

View solution in original post

ITWhisperer · ‎02-08-2021

Try something like this

index=systemA OR index=systemB
| fields customer_ID, systemA_Timestamp, systemB_Timestamp
| stats values(*) as * by customer_ID
| fillnull value="NotPresent" systemB_Timestamp
| where systemB_Timestamp="NotPresent"
| table customer_ID, systemA_Timestamp, systemB_Timestamp

mattiasrs · ‎02-08-2021

Thank you!

With this I am able to achieve what I want.

Do you know if this is best practice?

ITWhisperer · ‎02-08-2021

I guess that might depend on what your actual data looks like and what your criteria for determining "best" is. Using stats is usually better than join, fillnull should be reasonably inexpensive as it can be done in-stream / in parallel, same goes for the where clause.

mattiasrs · ‎02-08-2021

Many thanks!

saravanan90 · ‎02-05-2021

Please check if the below query works.

index=systemA NOT [search index=systemB | stats count by customer_ID | table customer_ID]

mattiasrs · ‎02-08-2021

Hello,

No this returns only the raw events, not in a table format.

I tried moving |table outside the [ ] but this returns data that exist in both indexes, just not showing the data from index B.

saravanan90 · ‎02-08-2021

Hello,

Is the table command outside gives still raw events of both the index.

index=systemA NOT [search index=systemB | stats count by customer_ID | table customer_ID] | table customer_ID, systemA_Timestamp

Left join - find missing data from second index

fields

Index This | When is October more than just the tenth month?

Observe and Secure All Apps with Splunk

What’s New & Next in Splunk SOAR

Are you a member of the Splunk Community?

Left join - find missing data from second index

fields

Index This | When is October more than just the tenth month?

Observe and Secure All Apps with Splunk

What’s New & Next in Splunk SOAR