Splunk Search

Joining searches on common columns

siu
Loves-to-Learn Everything

HI query joining 2 searches on left join.

Its matching some rows and not matching some rows although the column where I join on is clearly seen in both searches.

 

 

 

index=sky sourcetype=sky_trade_murex_timestamp 
| rex field=_raw "trade_id=\"(?<trade_id>\d+)\"" 
| rex field=_raw "mx_status=\"(?<mx_status>[^\"]+)\""
| rex field=_raw "sky_id=\"(?<sky_id>\d+)\"" 
| rex field=_raw "event_id=\"(?<event_id>\d+)\""
| rex field=_raw "operation=\"(?<operation>[^\"]+)\""
| rex field=_raw "action=\"(?<action>[^\"]+)\""
| rex field=_raw "tradebooking_sgp=\"(?<tradebooking_sgp>[^\"]+)\""
| rex field=_raw "portfolio_name=\"(?<portfolio_name>[^\"]+)\""
| rex field=_raw "portfolio_entity=\"(?<portfolio_entity>[^\"]+)\""
| rex field=_raw "trade_type=\"(?<trade_type>[^\"]+)\""
| rename trade_id as NB
| dedup NB
| eval NB = tostring(trim(NB))
| table sky_id, NB, event_id, mx_status, operation, action, tradebooking_sgp, portfolio_name, portfolio_entity, trade_type
| join type=left NB
[ search index=sky sourcetype=mx_to_sky
| rex field=_raw "(?<NB>\d+);(?<TRN_STATUS>[^;]+);(?<NOMINAL>[^;]+);(?<CURRENCY>[^;]+);(?<TRN_FMLY>[^;]+);(?<TRN_GRP>[^;]+);(?<TRN_TYPE>[^;]*);(?<BPFOLIO>[^;]*);(?<SPFOLIO>[^;]*)"
| eval NB = tostring(trim(NB))
| table TRN_STATUS, NB, NOMINAL, CURRENCY, TRN_FMLY, TRN_GRP, TRN_TYPE, BPFOLIO, SPFOLIO]
| table sky_id, NB, event_id, mx_status, operation, action, tradebooking_sgp, portfolio_name, portfolio_entity, trade_type, TRN_STATUS, NOMINAL, CURRENCY, TRN_FMLY, TRN_GRP, TRN_TYPE, BPFOLIO, SPFOLIO

 

 

 

 This above is my source code

And the raw data is

 

 

 

Time	Event
27/12/2024
17:05:39.000	
32265376;DEAD;3887.00000000;XAU;CURR;FXD;FXD;CM TR GLD AUS;X_CMTR XAU SWAP
host = APPSG002SIN0117source = D:\SkyNet\data\mx_trade_report\MX2_TRADE_STATUS_20241227_200037.csvsourcetype = mx_to_sky
Time	Event
27/12/2024
18:05:36.651	
2024-12-27 18:05:36.651, system="murex", id="645131777", sky_id="645131777", trade_id="32265483", event_id="100023788", mx_status="DEAD", operation="NETTING", action="insertion", tradebooking_sgp="2024/12/26 01:02:01.0000", eventtime_sgp="2024/12/26 01:01:51.7630", sky_to_mq_latency="-9.-237", portfolio_name="I CREDIT INC", portfolio_entity="ANZSEC INC", trade_type="BondTrade"
host = APPSG002SIN0032source = sky_trade_murex_timestamp sourcetype = sky_trade_murex_timestamp

 

Labels (3)
0 Karma

siu
Loves-to-Learn Everything

Hi all,

this i get about 

Running4,003,400 of 4,003,400 events matched in  aday 

 

| join type=left NB
[ search index=sky sourcetype=mx_to_sky
| rex field=_raw "(?<NB>[^;]+);(?<TRN_STATUS>[^;]+);(?<NOMINAL>[^;]+);(?<CURRENCY>[^;]+);(?<TRN_FMLY>[^;]+);(?<TRN_GRP>[^;]+);(?<TRN_TYPE>[^;]*);(?<BPFOLIO>[^;]*);(?<SPFOLIO>[^;]*)"
| table TRN_STATUS, NB, NOMINAL, CURRENCY, TRN_FMLY, TRN_GRP, TRN_TYPE, BPFOLIO, SPFOLIO]
| table sky_id, NB, event_id, mx_status, operation, action, tradebooking_sgp, portfolio_name, portfolio_entity, trade_type
| table sky_id, NB, event_id, mx_status, operation, action, tradebooking_sgp, portfolio_name, portfolio_entity, trade_type, TRN_STATUS, NOMINAL, CURRENCY, TRN_FMLY, TRN_GRP, TRN_TYPE, BPFOLIO, SPFOLIO

 

 

 

index=sky sourcetype=mx_to_sky
| rex field=_raw "(?<NB>[^;]+);(?<TRN_STATUS>[^;]+);(?<NOMINAL>[^;]+);(?<CURRENCY>[^;]+);(?<TRN_FMLY>[^;]+);(?<TRN_GRP>[^;]+);(?<TRN_TYPE>[^;]*);(?<BPFOLIO>[^;]*);(?<SPFOLIO>[^;]*)"
| table TRN_STATUS, NB, NOMINAL, CURRENCY, TRN_FMLY, TRN_GRP, TRN_TYPE, BPFOLIO, SPFOLIO

 

This below i get about  1,065,810 events in a day 

0 Karma

PickleRick
SplunkTrust
SplunkTrust

Assuming both your joined searches produce proper results (it's up to you to check it - we don't know), the easiest and most straightforward way to avoid join altogether is to use multisearch to run those in parallel and then stats the results. This way you're not prone to hit join's limits and since your searches are streaming ones you can use multisearch and you're not limited by subsearch limits which you might hit when using append.

 

| multisearch [
search index=sky sourcetype=sky_trade_murex_timestamp 
| rex field=_raw "trade_id=\"(?<trade_id>\d+)\"" 
| rex field=_raw "mx_status=\"(?<mx_status>[^\"]+)\""
| rex field=_raw "sky_id=\"(?<sky_id>\d+)\"" 
| rex field=_raw "event_id=\"(?<event_id>\d+)\""
| rex field=_raw "operation=\"(?<operation>[^\"]+)\""
| rex field=_raw "action=\"(?<action>[^\"]+)\""
| rex field=_raw "tradebooking_sgp=\"(?<tradebooking_sgp>[^\"]+)\""
| rex field=_raw "portfolio_name=\"(?<portfolio_name>[^\"]+)\""
| rex field=_raw "portfolio_entity=\"(?<portfolio_entity>[^\"]+)\""
| rex field=_raw "trade_type=\"(?<trade_type>[^\"]+)\""
| rename trade_id as NB
| dedup NB
| eval NB = tostring(trim(NB))
| table sky_id, NB, event_id, mx_status, operation, action, tradebooking_sgp, portfolio_name, portfolio_entity, trade_type ]
[ search index=sky sourcetype=mx_to_sky
| rex field=_raw "(?<NB>\d+);(?<TRN_STATUS>[^;]+);(?<NOMINAL>[^;]+);(?<CURRENCY>[^;]+);(?<TRN_FMLY>[^;]+);(?<TRN_GRP>[^;]+);(?<TRN_TYPE>[^;]*);(?<BPFOLIO>[^;]*);(?<SPFOLIO>[^;]*)"
| eval NB = tostring(trim(NB))
| table TRN_STATUS, NB, NOMINAL, CURRENCY, TRN_FMLY, TRN_GRP, TRN_TYPE, BPFOLIO, SPFOLIO ]
| stats values(*) as * by NB

 

 

0 Karma

siu
Loves-to-Learn Everything

Thanks!! I get this error though 

Error in 'multisearch' command: Multisearch subsearches might only contain purely streaming operations (subsearch 1 contains a non-streaming command).
 
0 Karma

PickleRick
SplunkTrust
SplunkTrust

Ahhhh... The "table" command is a transforming command. So you can't use it in either search. Use "fields" instead.

0 Karma

siu
Loves-to-Learn Everything

ah yes i removed those and the search continued. But it was so laggy as there were many events, i did not get a proper search result without it hanging

0 Karma

siu
Loves-to-Learn Everything

And the problem is these columns are empty for some and populated for some. For those empty, I clearly checked the NB is matching in both searches

TRN_STATUS, NOMINAL, CURRENCY, TRN_FMLY, TRN_GRP, TRN_TYPE, BPFOLIO, SPFOLIO

0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

Your regex assumes (insists!) that the event has 9 fields separated by (8) semi-colons - your sample data has only 8 fields separated by 7 semi-colons.

0 Karma

siu
Loves-to-Learn Everything

Sorry how do i rectify it?

32265376;DEAD;3887.00000000;XAU;CURR;FXD;FXD;CM TR GLD AUS;X_CMTR XAU SWAP

Did u mean this 

0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

Sorry I miscounted, it does look right - the issue here is that the trade_id does not match the first field in the mx_to_sky event

0 Karma

siu
Loves-to-Learn Everything

No problem. But why is it that some rows are populated and some are not then? i.e some match and some don't match. I renamed trade_id as NB then left=join NB so it should join 2 of them together but why does it not work for some rows or columns although clearly it matches in both searches?

0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

If there are results for the trade_id/NB in both searches, then it is possible that the rex has not extracted the fields as you expect. Please share the two events which don't match

0 Karma

siu
Loves-to-Learn Everything

Its quite random but here is one

2024-12-27 23:05:09.917, system="murex", id="645437844", sky_id="645437844", trade_id="31791027", event_id="100038914", mx_status="live", operation="nooperation", action="fixing", tradebooking_sgp="2024/12/27 08:42:39.0000", eventtime_sgp="2024/12/27 08:42:33.6400", sky_to_mq_latency="-5.-360", portfolio_name="A ZZ AUD LTR", portfolio_entity="ANZBG MELB", trade_type="BasisSwap"

31791027;LIVE;17500000.00000000;AUD;IRD;IRS;;A ZZ AUD LTR;X_GCM_IRD_AUCNZ

 

 

 

index=sky sourcetype=sky_trade_murex_timestamp 
| rex field=_raw "trade_id=\"(?<trade_id>\d+)\"" 
| rex field=_raw "mx_status=\"(?<mx_status>[^\"]+)\""
| rex field=_raw "sky_id=\"(?<sky_id>\d+)\"" 
| rex field=_raw "event_id=\"(?<event_id>\d+)\""
| rex field=_raw "operation=\"(?<operation>[^\"]+)\""
| rex field=_raw "action=\"(?<action>[^\"]+)\""
| rex field=_raw "tradebooking_sgp=\"(?<tradebooking_sgp>[^\"]+)\""
| rex field=_raw "portfolio_name=\"(?<portfolio_name>[^\"]+)\""
| rex field=_raw "portfolio_entity=\"(?<portfolio_entity>[^\"]+)\""
| rex field=_raw "trade_type=\"(?<trade_type>[^\"]+)\""
| eval trade_id = replace(trade_id, "\"", "")
| rename trade_id as NB
| table sky_id, NB, event_id, mx_status, operation, action, tradebooking_sgp, portfolio_name, portfolio_entity, trade_type
| join type=left NB
[ search index=sky sourcetype=mx_to_sky
| rex field=_raw "(?<NB>[^;]+);(?<TRN_STATUS>[^;]+);(?<NOMINAL>[^;]+);(?<CURRENCY>[^;]+);(?<TRN_FMLY>[^;]+);(?<TRN_GRP>[^;]+);(?<TRN_TYPE>[^;]*);(?<BPFOLIO>[^;]*);(?<SPFOLIO>[^;]*)"
| table TRN_STATUS, NB, NOMINAL, CURRENCY, TRN_FMLY, TRN_GRP, TRN_TYPE, BPFOLIO, SPFOLIO]
| table sky_id, NB, event_id, mx_status, operation, action, tradebooking_sgp, portfolio_name, portfolio_entity, trade_type
| table sky_id, NB, event_id, mx_status, operation, action, tradebooking_sgp, portfolio_name, portfolio_entity, trade_type, TRN_STATUS, NOMINAL, CURRENCY, TRN_FMLY, TRN_GRP, TRN_TYPE, BPFOLIO, SPFOLIO

 

For that NB i get all columns and empty for TRN_STATUS, NOMINAL, CURRENCY, TRN_FMLY, TRN_GRP, TRN_TYPE, BPFOLIO, SPFOLIO

 

0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

Subsearches are limited to 50,000 events which could account for the missing matches - try reducing the timeframes - do the matches appear then?

0 Karma

siu
Loves-to-Learn Everything

Thank you both, I need for >50k/>10k events
I am thinking of using appendcols but they are not able to join like this.

Any other work around?

both searches amass quite a huge number of events >50k >10k and I need to search for today which would be alot.

0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

As has been said here many times, it is best to avoid using join - this is a classic case of why join should be avoided.

Try something like this

index=sky sourcetype=sky_trade_murex_timestamp OR sourcetype=mx_to_sky
``` Parse sky_trade_murex_timestamp events (note that trade_id is put directly into the NB field) ```
| rex field=_raw "trade_id=\"(?<NB>\d+)\"" 
| rex field=_raw "mx_status=\"(?<mx_status>[^\"]+)\""
| rex field=_raw "sky_id=\"(?<sky_id>\d+)\"" 
| rex field=_raw "event_id=\"(?<event_id>\d+)\""
| rex field=_raw "operation=\"(?<operation>[^\"]+)\""
| rex field=_raw "action=\"(?<action>[^\"]+)\""
| rex field=_raw "tradebooking_sgp=\"(?<tradebooking_sgp>[^\"]+)\""
| rex field=_raw "portfolio_name=\"(?<portfolio_name>[^\"]+)\""
| rex field=_raw "portfolio_entity=\"(?<portfolio_entity>[^\"]+)\""
| rex field=_raw "trade_type=\"(?<trade_type>[^\"]+)\""
``` Parse mx_to_sky events ```
| rex field=_raw "(?<NB>[^;]+);(?<TRN_STATUS>[^;]+);(?<NOMINAL>[^;]+);(?<CURRENCY>[^;]+);(?<TRN_FMLY>[^;]+);(?<TRN_GRP>[^;]+);(?<TRN_TYPE>[^;]*);(?<BPFOLIO>[^;]*);(?<SPFOLIO>[^;]*)"
``` Reduce to just the fields of interest ```
| table sky_id, NB, event_id, mx_status, operation, action, tradebooking_sgp, portfolio_name, portfolio_entity, trade_type, TRN_STATUS, NOMINAL, CURRENCY, TRN_FMLY, TRN_GRP, TRN_TYPE, BPFOLIO, SPFOLIO
``` "Join" events by NB using stats ```
| stats values(*) as * by NB
0 Karma

siu
Loves-to-Learn Everything

im gonna try this thanks!! 

I think i got something like all the results into one row and performance is very bad as there are many events i did not manage to get a proper search result

0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

Try replacing the table command with fields

0 Karma

siu
Loves-to-Learn Everything
index=sky sourcetype=sky_trade_murex_timestamp OR sourcetype=mx_to_sky
``` Parse sky_trade_murex_timestamp events (note that trade_id is put directly into the NB field) ```
| rex field=_raw "trade_id=\"(?<NB>\d+)\"" 
| rex field=_raw "mx_status=\"(?<mx_status>[^\"]+)\""
| rex field=_raw "sky_id=\"(?<sky_id>\d+)\"" 
| rex field=_raw "event_id=\"(?<event_id>\d+)\""
| rex field=_raw "operation=\"(?<operation>[^\"]+)\""
| rex field=_raw "action=\"(?<action>[^\"]+)\""
| rex field=_raw "tradebooking_sgp=\"(?<tradebooking_sgp>[^\"]+)\""
| rex field=_raw "portfolio_name=\"(?<portfolio_name>[^\"]+)\""
| rex field=_raw "portfolio_entity=\"(?<portfolio_entity>[^\"]+)\""
| rex field=_raw "trade_type=\"(?<trade_type>[^\"]+)\""
``` Parse mx_to_sky events ```
| rex field=_raw "(?<NB>[^;]+);(?<TRN_STATUS>[^;]+);(?<NOMINAL>[^;]+);(?<CURRENCY>[^;]+);(?<TRN_FMLY>[^;]+);(?<TRN_GRP>[^;]+);(?<TRN_TYPE>[^;]*);(?<BPFOLIO>[^;]*);(?<SPFOLIO>[^;]*)"
``` Reduce to just the fields of interest ```
| fields sky_id, NB, event_id, mx_status, operation, action, tradebooking_sgp, portfolio_name, portfolio_entity, trade_type, TRN_STATUS, NOMINAL, CURRENCY, TRN_FMLY, TRN_GRP, TRN_TYPE, BPFOLIO, SPFOLIO
``` "Join" events by NB using stats ```
| dedup NB
| stats values(*) as * by NB
0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

Remove the dedup NB! This is reducing your events to one event per NB which is why you are only getting half your data!

PickleRick
SplunkTrust
SplunkTrust

Good catch. I remembered that dedup was needlessly used in both "compound searches" but didn't notice that it was transfered to the "composite search". Indeed it only leaves us with one of the two or more "joinable" events.

0 Karma
Get Updates on the Splunk Community!

Enterprise Security Content Update (ESCU) | New Releases

In December, the Splunk Threat Research Team had 1 release of new security content via the Enterprise Security ...

Why am I not seeing the finding in Splunk Enterprise Security Analyst Queue?

(This is the first of a series of 2 blogs). Splunk Enterprise Security is a fantastic tool that offers robust ...

Index This | What are the 12 Days of Splunk-mas?

December 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...