Splunk Search

Left join - find missing data from second index

mattiasrs
Explorer

Hello, I am quite new to Splunk and this is my first post. Hoping that I can get some help from this awesome community.

I have two systems, System A and System B. System A receives customer information which is then sent to System B . The data in both systems have the exact same fields and a unique Customer ID with the same name in both systems.

I want to create a dashboard where I can select a time period and see only problematic customers that only exist in System A, meaning they haven't been sent to System B for some reason.

This is my search to see all the data:

index=systemA OR index=systemB

| fields customer_ID, systemA_Timestamp, systemB_Timestamp

| stats values(*) as * by customer_ID

| table customer_ID, systemA_Timestamp, systemB_Timestamp

 

So to summarize, I want to see customer_IDs that only exist in System A. I am not sure which function to use here. I have been experimenting with isnull(systemB_Timestamp) with no success. Join is not an option as the limit of 50 000 might be a problem.

 

Would be very grateful for any help!

Labels (1)
Tags (3)
0 Karma
1 Solution

ITWhisperer
SplunkTrust
SplunkTrust

Try something like this

index=systemA OR index=systemB
| fields customer_ID, systemA_Timestamp, systemB_Timestamp
| stats values(*) as * by customer_ID
| fillnull value="NotPresent" systemB_Timestamp
| where systemB_Timestamp="NotPresent"
| table customer_ID, systemA_Timestamp, systemB_Timestamp

View solution in original post

ITWhisperer
SplunkTrust
SplunkTrust

Try something like this

index=systemA OR index=systemB
| fields customer_ID, systemA_Timestamp, systemB_Timestamp
| stats values(*) as * by customer_ID
| fillnull value="NotPresent" systemB_Timestamp
| where systemB_Timestamp="NotPresent"
| table customer_ID, systemA_Timestamp, systemB_Timestamp

View solution in original post

mattiasrs
Explorer

Thank you!

With this I am able to achieve what I want.

Do you know if this is best practice?

0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

I guess that might depend on what your actual data looks like and what your criteria for determining "best" is. Using stats is usually better than join, fillnull should be reasonably inexpensive as it can be done in-stream / in parallel, same goes for the where clause.

mattiasrs
Explorer

Many thanks!

0 Karma

saravanan90
Contributor

Please check if the below query works.

index=systemA  NOT [search index=systemB | stats count by customer_ID | table customer_ID

0 Karma

mattiasrs
Explorer

Hello,

No this returns only the raw events, not in a table format.

I tried moving |table outside the [ ] but this returns data that exist in both indexes, just not showing the data from index B.

0 Karma

saravanan90
Contributor

Hello,

Is the table command outside gives still raw events of both the index.

index=systemA NOT [search index=systemB | stats count by customer_ID | table customer_ID] | table customer_ID, systemA_Timestamp

0 Karma
Did you miss .conf21 Virtual?

Good news! The event's keynotes and many of its breakout sessions are now available online, and still totally FREE!