Hello,
I am looking for the equivalent of performing SQL like such:
SELECT transaction_id, vendor
FROM orders
WHERE transaction_id NOT IN (SELECT transaction_id FROM events).
As of right now I can construct a list of transaction_ids for orders in one search query and a list of transaction_ids for events in another search query, but my ultimate goal is to return order logs that do not share transaction_ids with the transaction_ids of the events log. Any help is greatly appreciated, thanks!
There are many ways to do this, but the most appropriate method depends a lot on the actual data. You really want to illustrate the dataset and explain the outcome, maybe with mockups. It is fine to illustrate some ideas using a different language. But how would orders and events be represented in Splunk, for example? Do transaction_id and vendor already exist in data sources or is there some maneuver to extract them? Such details can make a difference because SPL is fundamentally different from SQL.
Assuming orders and events are from two sourcetypes and fields are already extracted, you can do
sourcetype IN (orders, events)
| stats values(sourcetype) as sourcetype values(vendor) as vendor by transaction_id
| where sourcetype == events and mvcount(sourcetype) > 1
Splunk has written a manual to help SQL users transition to SPL. See https://docs.splunk.com/Documentation/Splunk/9.0.2/SearchReference/SQLtoSplunk
The NOT IN construct in SQL can be expressed as NOT and a subsearch.
index=orders vendor=* NOT [ index=events transaction_id=* | fields transaction_id ]