Splunk Search

How to Merge Rows in Table With Similar Data, While Maintaining Records That Do Not Match

Razziq
Explorer

Hello,

We are currently working with two sets of data that have similar fields. We would like to align matching events in one row (payment amount, category/source and account number) while also maintaining the values that do not match for failed processing.  Below are some screenshots of what the data looks like now in four rows, as well as what we're hoping to visualize in 3 rows. Any assistance would be greatly appreciated!

Razziq_0-1647363812406.png

 

Razziq_1-1647363829846.png

Below is our current search:

index="index1" Tag="Tag1"
| stats values(PaymentAmount) as PaymentAmount by PaymentChannel,AccountId,PaymentCategory,ResponseStatus,StartDT
| rename AccountId as AccountNumber
| rename PaymentChannel as A_PaymentChannel
| rename PaymentCategory as A_PaymentCategory
| rename ResponseStatus as A_ResponseStatus
| rename StartDT as A_Time
| append
[search index="index2" sourcetype="source2"
| rename PaymentAmount as M_PayAmt
| eval PayAmt = tonumber(round(M_PayAmt,2))
| rex field=source "M_(?<M_Source>\w+)_data.csv"
| rename "TERMINAL ID" as M_KioskID
| rename "ResponseStatus" as "M_ResponseStatus"
| rename "KIOSK REPORT TIME" as M_Time
| eval _time =strptime(M_Time,"%Y-%m-%d %H:%M:%S.%3Q")
| addinfo
| where _time>=info_min_time AND (_time<=info_max_time OR info_max_time="+Infinity")
| stats values(PayAmt) as M_PayAmt latest(M_Time) by AccountNumber, M_Source, M_ResponseStatus,M_KioskID
| rename latest(M_Time) as M_Time
| table M_PayAmt,AccountNumber, M_Source, M_KioskID,M_ResponseStatus,M_Time
| mvexpand M_PayAmt]
| eval A_PaymentTotal = "$" + PaymentAmount
| eval M_PayAmt = "$" + M_PayAmt
| eval joiner = AccountNumber
| table AccountNumber,A_PaymentChannel,M_KioskID,A_PaymentCategory,M_Source,A_PaymentTotal,M_PayAmt,A_ResponseStatus,M_ResponseStatus,A_Time,_Time
| eval M_PayAmt=if(isnull(M_PayAmt),"Unknown",M_PayAmt)
| eval A_PaymentTotal=if(isnull(A_PaymentTotal),"Unknown",A_PaymentTotal)
| eval A_Time=if(isnull(A_Time), M_Time, A_Time)
| eval M_Time=if(isnull(M_Time), A_Time, M_Time)
| sort by M_Time desc

Labels (1)
0 Karma

yuanliu
SplunkTrust
SplunkTrust

Let's call the one with index=index1 Tag=Tag1 the "A" search; call the other "M" search.  Drawing from your mockups AND sample code, the requirement can be summarized as

  1. Join the "A" search and "M" search by AccountNumber
  2. Show amounts as "UNKNOWN" when value is missing.
  3. If "Time" from one search is missing, fill with that from the other.

This can be done with stats.  See if the following works:

index="index1" Tag="Tag1"
| stats values(PaymentAmount) as A_PaymentTotal by PaymentChannel,AccountId,PaymentCategory,ResponseStatus,StartDT ``` subsequent code and mockups use "A_PaymentTotal" instead of "PaymentAmount" ```
| rename AccountId as AccountNumber
| rename PaymentChannel as A_PaymentChannel
| rename PaymentCategory as A_PaymentCategory
| rename ResponseStatus as A_ResponseStatus
| rename StartDT as A_Time
| append
[search index="index2" sourcetype="source2"
| rename PaymentAmount as M_PayAmt
| eval PayAmt = tonumber(round(M_PayAmt,2))
| rex field=source "M_(?<M_Source>\w+)_data.csv"
| rename "TERMINAL ID" as M_KioskID
| rename "ResponseStatus" as "M_ResponseStatus"
| rename "KIOSK REPORT TIME" as M_Time
| eval _time =strptime(M_Time,"%Y-%m-%d %H:%M:%S.%3Q")
| addinfo
| where _time>=info_min_time AND (_time<=info_max_time OR info_max_time="+Infinity")
| stats values(PayAmt) as M_PayAmt latest(M_Time) as M_time by AccountNumber, M_Source, M_ResponseStatus,M_KioskID ``` rename in stats is more economic ```
| table M_PayAmt,AccountNumber, M_Source, M_KioskID,M_ResponseStatus,M_Time
| mvexpand M_PayAmt]
| stats values(*) as * by AccountNumber ``` join by AccountNumber ```
| eval M_PayAmt=if(isnull(M_PayAmt),"Unknown", "$" + M_PayAmt) ``` small code economy to combine string concatenation inside conditional ```
| eval A_PaymentTotal=if(isnull(A_PaymentTotal),"Unknown", "$" + A_PaymentTotal) ``` illustrated code uses PaymentAmount but mockups do not use this name ```
| eval A_Time=if(isnull(A_Time), M_Time, A_Time)
| eval M_Time=if(isnull(M_Time), A_Time, M_Time)
| table AccountNumber,A_PaymentChannel,M_KioskID,A_PaymentCategory,M_Source,A_PaymentTotal,M_PayAmt,A_ResponseStatus,M_ResponseStatus,A_Time,M_Time ``` illustrated code used _Time instead of M_time ```
| sort by M_Time desc

The above code includes some minor adjustments based on mockups, and a little bit of readability. (See comments.)  Other than that, it is really just one added stats by AccountNumber, the same technique already employed in both "A" search and "M" search.

0 Karma

Razziq
Explorer

Sorry I should've said that our current search looks more like this:

index="index1" Tag="Tag1"
| stats values(PaymentAmount) as A_PaymentAmount by PaymentChannel,AccountId,PaymentCategory,ResponseStatus,StartDT
| rename AccountId as AccountNumber
| rename PaymentChannel as A_PaymentChannel
| rename PaymentCategory as A_PaymentCategory
| rename ResponseStatus as A_ResponseStatus
| rename StartDT as A_Time
| append
[search index="index2" sourcetype="source2"
| rename PaymentAmount as M_PayAmt
| eval PayAmt = tonumber(round(M_PayAmt,2))
| rex field=source "M_(?<M_Source>\w+)_data.csv"
| rename "TERMINAL ID" as M_KioskID
| rename "ResponseStatus" as "M_ResponseStatus"
| rename "KIOSK REPORT TIME" as M_Time
| eval _time =strptime(M_Time,"%Y-%m-%d %H:%M:%S.%3Q")
| addinfo
| where _time>=info_min_time AND (_time<=info_max_time OR info_max_time="+Infinity")
| stats values(PayAmt) as M_PayAmt by AccountNumber, M_Source, M_KioskID,M_ResponseStatus,M_Time]
| eval A_PaymentTotal = "$" + A_PaymentAmount
| eval M_PayAmt = "$" + M_PayAmt
| eval joiner = AccountNumber
| stats values(*) as * by joiner
| table AccountNumber,A_PaymentChannel,Kiosk,A_PaymentCategory,M_Source,A_PaymentTotal,M_PayAmt,A_ResponseStatus,M_ResponseStatus,A_Time,M_Time
| eval M_PayAmt=if(isnull(M_PayAmt),"Unknown",M_PayAmt)
| eval A_PaymentTotal=if(isnull(A_PaymentTotal),"Unknown",A_PaymentTotal)
| eval A_Time=if(isnull(A_Time), M_Time, A_Time)
| eval M_Time=if(isnull(M_Time), A_Time, M_Time)
| sort by M_Time desc
| fields - joiner

However when there are multiple entries for the same account, it lumps them together like so:

Razziq_0-1647633767732.png


where we are trying to get each to display in their own rows. I was playing with streamstats and pulling in for more info, but there are a lot of variables for differing or missing values between both systems. Once is coming via DBX and the other is coming in from a custom Python data scrape, so _time is not possible to use. It might just be that I'll have to do a lot of streamstats and evals for all situations

0 Karma

yuanliu
SplunkTrust
SplunkTrust

I was wondering about the cleanness of the mockup and the use of values(PayAmt), etc.  If there are multiple transaction from the same account but no transaction ID AND _time is not a reliable determinant, life can be really tough. (My explanation of the cleanness to myself was that maybe you were working with one of those medical labs that used a unique account for every transaction.)  In essence, the challenge is semantic because there is no clear definition of a "match".  I am not sure how much streamstats can help until defined "match" criteria emerge.

Anyway, if there is anyway to define a unique "primary key" with a group of variables, you can use them in groupby.  For example, suppose status "Success" in "A" search matches that of "True" in "M" search, "A" Category matches "M" Source, and the payment amount can be used for matching.  Instead of naming these matching fields differently, they should have identical names and matching values should be equal so groupby can do its magic, like

index="index1" Tag="Tag1"
| rename AccountId as AccountNumber
| rename PaymentChannel as A_PaymentChannel
| eval PaymentAmount = round(PaymentAmount, 2)
| rename StartDT as Time
| table PaymentAmount A_PaymentChannel,AccountNumber,PaymentCategory,ResponseStatus, Time, index
| append
[search index="index2" sourcetype="source2"
| eval PaymentAmount = round(PaymentAmount,2)
| rex field=source "M_(?<M_Source>\w+)_data.csv"
    | eval PaymentCategory = if(M_Source == "card", "Credit", "Cash")
    | eval ResponseStatus = if(ResponseStatus == "True", "Success", "Failure")
| rename "TERMINAL ID" as M_KioskID
| rename "KIOSK REPORT TIME" as Time
| eval _time =strptime(Time,"%Y-%m-%d %H:%M:%S.%3Q")
| addinfo
| where _time>=info_min_time AND (_time<=info_max_time OR info_max_time="+Infinity")
| table PaymentAmount AccountNumber, PaymentCategory, M_KioskID,ResponseStatus, Time, index]

| stats values(*) as * by AccountNumber PaymentCategory ResponseStatus
| table AccountNumber,A_PaymentChannel,M_KioskID,PaymentCategory,PaymentAmount,ResponseStatus,Time
| eval PaymentAmount = case(mvcount(index)==2, "$", index=="index1", "A$", index=="index2", "M$") + PaymentAmount
| sort - Time
0 Karma
Get Updates on the Splunk Community!

Uncovering Multi-Account Fraud with Splunk Banking Analytics

Last month, I met with a Senior Fraud Analyst at a nationally recognized bank to discuss their recent success ...

Secure Your Future: A Deep Dive into the Compliance and Security Enhancements for the ...

What has been announced?  In the blog, “Preparing your Splunk Environment for OpensSSL3,”we announced the ...

New This Month in Splunk Observability Cloud - Synthetic Monitoring updates, UI ...

This month, we’re delivering several platform, infrastructure, application and digital experience monitoring ...