Splunk Search

Joining searches on common columns

siu
Loves-to-Learn Everything

HI query joining 2 searches on left join.

Its matching some rows and not matching some rows although the column where I join on is clearly seen in both searches.

 

 

 

index=sky sourcetype=sky_trade_murex_timestamp 
| rex field=_raw "trade_id=\"(?<trade_id>\d+)\"" 
| rex field=_raw "mx_status=\"(?<mx_status>[^\"]+)\""
| rex field=_raw "sky_id=\"(?<sky_id>\d+)\"" 
| rex field=_raw "event_id=\"(?<event_id>\d+)\""
| rex field=_raw "operation=\"(?<operation>[^\"]+)\""
| rex field=_raw "action=\"(?<action>[^\"]+)\""
| rex field=_raw "tradebooking_sgp=\"(?<tradebooking_sgp>[^\"]+)\""
| rex field=_raw "portfolio_name=\"(?<portfolio_name>[^\"]+)\""
| rex field=_raw "portfolio_entity=\"(?<portfolio_entity>[^\"]+)\""
| rex field=_raw "trade_type=\"(?<trade_type>[^\"]+)\""
| rename trade_id as NB
| dedup NB
| eval NB = tostring(trim(NB))
| table sky_id, NB, event_id, mx_status, operation, action, tradebooking_sgp, portfolio_name, portfolio_entity, trade_type
| join type=left NB
[ search index=sky sourcetype=mx_to_sky
| rex field=_raw "(?<NB>\d+);(?<TRN_STATUS>[^;]+);(?<NOMINAL>[^;]+);(?<CURRENCY>[^;]+);(?<TRN_FMLY>[^;]+);(?<TRN_GRP>[^;]+);(?<TRN_TYPE>[^;]*);(?<BPFOLIO>[^;]*);(?<SPFOLIO>[^;]*)"
| eval NB = tostring(trim(NB))
| table TRN_STATUS, NB, NOMINAL, CURRENCY, TRN_FMLY, TRN_GRP, TRN_TYPE, BPFOLIO, SPFOLIO]
| table sky_id, NB, event_id, mx_status, operation, action, tradebooking_sgp, portfolio_name, portfolio_entity, trade_type, TRN_STATUS, NOMINAL, CURRENCY, TRN_FMLY, TRN_GRP, TRN_TYPE, BPFOLIO, SPFOLIO

 

 

 

 This above is my source code

And the raw data is

 

 

 

Time	Event
27/12/2024
17:05:39.000	
32265376;DEAD;3887.00000000;XAU;CURR;FXD;FXD;CM TR GLD AUS;X_CMTR XAU SWAP
host = APPSG002SIN0117source = D:\SkyNet\data\mx_trade_report\MX2_TRADE_STATUS_20241227_200037.csvsourcetype = mx_to_sky
Time	Event
27/12/2024
18:05:36.651	
2024-12-27 18:05:36.651, system="murex", id="645131777", sky_id="645131777", trade_id="32265483", event_id="100023788", mx_status="DEAD", operation="NETTING", action="insertion", tradebooking_sgp="2024/12/26 01:02:01.0000", eventtime_sgp="2024/12/26 01:01:51.7630", sky_to_mq_latency="-9.-237", portfolio_name="I CREDIT INC", portfolio_entity="ANZSEC INC", trade_type="BondTrade"
host = APPSG002SIN0032source = sky_trade_murex_timestamp sourcetype = sky_trade_murex_timestamp

 

Labels (3)
0 Karma

PickleRick
SplunkTrust
SplunkTrust

That is very strange because that suggests that you have much more fields that you're creating in this search.

Just for testing, replace the last stats command with

| head 1000
| stats values(*) as * by NB
| fillnull
| head 1
| transpose 0

Oh, and since you're doing stats values anyway, the dedup command is not needed. It can in fact give you a performance penalty because dedup is a centralized command while all preceeding ones are distributed streaming and stats can be distributed to some extent.

0 Karma

siu
Loves-to-Learn Everything

OH also i dont get these in the results TRN_STATUS, NOMINAL, CURRENCY, TRN_FMLY, TRN_GRP, TRN_TYPE, BPFOLIO, SPFOLIO

I only see the first half of the fields command I included.

0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

Please share the search you are using for these results (my mind-reading abilities have been dulled by over-indulgence!)

0 Karma

siu
Loves-to-Learn Everything

Thanks guys!

index=sky sourcetype=sky_trade_murex_timestamp OR sourcetype=mx_to_sky
``` Parse sky_trade_murex_timestamp events (note that trade_id is put directly into the NB field) ```
| rex field=_raw "trade_id=\"(?<NB>\d+)\"" 
| rex field=_raw "mx_status=\"(?<mx_status>[^\"]+)\""
| rex field=_raw "sky_id=\"(?<sky_id>\d+)\"" 
| rex field=_raw "event_id=\"(?<event_id>\d+)\""
| rex field=_raw "operation=\"(?<operation>[^\"]+)\""
| rex field=_raw "action=\"(?<action>[^\"]+)\""
| rex field=_raw "tradebooking_sgp=\"(?<tradebooking_sgp>[^\"]+)\""
| rex field=_raw "portfolio_name=\"(?<portfolio_name>[^\"]+)\""
| rex field=_raw "portfolio_entity=\"(?<portfolio_entity>[^\"]+)\""
| rex field=_raw "trade_type=\"(?<trade_type>[^\"]+)\""
``` Parse mx_to_sky events ```
| rex field=_raw "(?<NB>[^;]+);(?<TRN_STATUS>[^;]+);(?<NOMINAL>[^;]+);(?<CURRENCY>[^;]+);(?<TRN_FMLY>[^;]+);(?<TRN_GRP>[^;]+);(?<TRN_TYPE>[^;]*);(?<BPFOLIO>[^;]*);(?<SPFOLIO>[^;]*)"
``` Reduce to just the fields of interest ```
| fields sky_id, NB, event_id, mx_status, operation, action, tradebooking_sgp, portfolio_name, portfolio_entity, trade_type, TRN_STATUS, NOMINAL, CURRENCY, TRN_FMLY, TRN_GRP, TRN_TYPE, BPFOLIO, SPFOLIO
``` "Join" events by NB using stats ```
| head 1000
| stats values(*) as * by NB
| fillnull 
| head 1
| transpose 0

Now its this

I get a 1000 events for an hour range estimated and results as shown

columnrow1
NB0
action0
event_id0
mx_statuslive
operationsydeod
portfolio_entitySG
KOREA
USA
...
portfolio_nameAUD APT
...
sky_id673821
...
trade_typeVanillaSwap
tradebooking_sgp

2024/12/26 00:06:34.3572

...

 

0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

Is this now fixed? (I am not sure what @PickleRick 's test commands (head 1000 etc.) are doing here though!)

0 Karma

siu
Loves-to-Learn Everything

this is unfortunately.. not fixed  😞

0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

Please explain what you are doing (the search), what results you are getting, and, most importantly, why this is not what you expected.

0 Karma

siu
Loves-to-Learn Everything

Thanks, as per my first post I'd like to join 2 searches together on NB and retain all columns . I am able to retain all columns but some rows are filled are some are not (but NB is definitely matching in both searches)

index=sky sourcetype=sky_trade_murex_timestamp 
| rex field=_raw "trade_id=\"(?<trade_id>\d+)\"" 
| rex field=_raw "mx_status=\"(?<mx_status>[^\"]+)\""
| rex field=_raw "sky_id=\"(?<sky_id>\d+)\"" 
| rex field=_raw "event_id=\"(?<event_id>\d+)\""
| rex field=_raw "operation=\"(?<operation>[^\"]+)\""
| rex field=_raw "action=\"(?<action>[^\"]+)\""
| rex field=_raw "tradebooking_sgp=\"(?<tradebooking_sgp>[^\"]+)\""
| rex field=_raw "portfolio_name=\"(?<portfolio_name>[^\"]+)\""
| rex field=_raw "portfolio_entity=\"(?<portfolio_entity>[^\"]+)\""
| rex field=_raw "trade_type=\"(?<trade_type>[^\"]+)\""
| rename trade_id as NB
| table sky_id, NB, event_id, mx_status, operation, action, tradebooking_sgp, portfolio_name, portfolio_entity, trade_type
index=sky sourcetype=mx_to_sky
| rex field=_raw "(?<NB>\d+);(?<TRN_STATUS>[^;]+);(?<NOMINAL>[^;]+);(?<CURRENCY>[^;]+);(?<TRN_FMLY>[^;]+);(?<TRN_GRP>[^;]+);(?<TRN_TYPE>[^;]*);(?<BPFOLIO>[^;]*);(?<SPFOLIO>[^;]*)"
| eval NB = tostring(trim(NB))
| table TRN_STATUS, NB, NOMINAL, CURRENCY, TRN_FMLY, TRN_GRP, TRN_TYPE, BPFOLIO, SPFOLIO
0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

What do you get when you try something like this?

index=sky sourcetype=sky_trade_murex_timestamp OR sourcetype=mx_to_sky
``` Parse sky_trade_murex_timestamp events (note that trade_id is put directly into the NB field) ```
| rex field=_raw "trade_id=\"(?<NB>\d+)\"" 
| rex field=_raw "mx_status=\"(?<mx_status>[^\"]+)\""
| rex field=_raw "sky_id=\"(?<sky_id>\d+)\"" 
| rex field=_raw "event_id=\"(?<event_id>\d+)\""
| rex field=_raw "operation=\"(?<operation>[^\"]+)\""
| rex field=_raw "action=\"(?<action>[^\"]+)\""
| rex field=_raw "tradebooking_sgp=\"(?<tradebooking_sgp>[^\"]+)\""
| rex field=_raw "portfolio_name=\"(?<portfolio_name>[^\"]+)\""
| rex field=_raw "portfolio_entity=\"(?<portfolio_entity>[^\"]+)\""
| rex field=_raw "trade_type=\"(?<trade_type>[^\"]+)\""
``` Parse mx_to_sky events ```
| rex field=_raw "(?<NB>\d+);(?<TRN_STATUS>[^;]+);(?<NOMINAL>[^;]+);(?<CURRENCY>[^;]+);(?<TRN_FMLY>[^;]+);(?<TRN_GRP>[^;]+);(?<TRN_TYPE>[^;]*);(?<BPFOLIO>[^;]*);(?<SPFOLIO>[^;]*)"
``` Reduce to just the fields of interest ```
| fields sky_id, NB, event_id, mx_status, operation, action, tradebooking_sgp, portfolio_name, portfolio_entity, trade_type, TRN_STATUS, NOMINAL, CURRENCY, TRN_FMLY, TRN_GRP, TRN_TYPE, BPFOLIO, SPFOLIO
``` "Join" events by NB using stats ```
| stats values(*) as * by NB
0 Karma

siu
Loves-to-Learn Everything

Thanks! Albeit abit slow and unresponsive i got some results

NBactionevent_idmx_statusoperationportfolio_entityportfolio_namesky_idtrade_typetradebooking_sgp
0 0LIVEsgp
usa
usaeod
ABC Korea
ABC Panema
...
...
A USD AOU
...
...
...
12345678VanillaSwap
...
...
YYYY/MM/DD HH:MM:SS
...
          
          
          
0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

And how is this not what you want?

0 Karma

siu
Loves-to-Learn Everything

These fields are missing

TRN_STATUS, NOMINAL, CURRENCY, TRN_FMLY, TRN_GRP, TRN_TYPE, BPFOLIO, SPFOLIO
0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

Please share some examples of the events which have these fields which are not being returned

0 Karma

siu
Loves-to-Learn Everything

Sure, also all the results are in a single row, ideally I would want them in separate

Wahts important to note is that I did get all the columns in the fields command but they are empty (the ones i stated , eg. NB onwards)

TimeEvent
03/01/2025
16:05:37.609
2025-01-03 16:05:37.609, system="murex", id="646523556", sky_id="646523556", trade_id="32248978", event_id="100120362", mx_status="live", operation="nooperation", action="modification", tradebooking_sgp="2025/01/02 02:01:23.0000", eventtime_sgp="2025/01/02 02:01:21.3800", sky_to_mq_latency="-1.-620", portfolio_name="test_oprtoflio", portfolio_entity="test_entity", trade_type="VanillaSwap"
03/01/2025
11:05:39.000
32248978;LIVE;0.00001000;AUD;IRD;CD;;test_prtoflio;CAMBOOYAPTSYDAU
0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

The results you shared earlier show NB as 0 which is odd since this doesn't appear to be what you should be getting given the events you have shared and the search you are apparently using. Do any of your events actually have 0 as the trade_id?

Have you examined the events which are present in the results to see why they might be being parsed in such a way to give NB as 0?

At the end of the day, the searches being suggested work with the data you have shared, so if they are not working as expected, it is likely to be because of the actual data you are using, and in order for us to be able to help you more, you should share some more of the data which is not working.

0 Karma

PickleRick
SplunkTrust
SplunkTrust

That was just for checking how many fields are returned from the sample of data. Of course it's not suitable for production search

0 Karma

siu
Loves-to-Learn Everything

thanks! Yep i tried that for just an hour range. I got one single row of data with everything in there it seems. I also couldnt scroll the page to confirm as page became unresponsive

0 Karma

PickleRick
SplunkTrust
SplunkTrust

To be precise, because it's often missed by people, the 50k limit for subsearch only applies to join command. The general limit for subsearch results in other uses is 10k by default.

0 Karma
Career Survey
First 500 qualified respondents will receive a $20 gift card! Tell us about your professional Splunk journey.

Can’t make it to .conf25? Join us online!

Get Updates on the Splunk Community!

Level Up Your .conf25: Splunk Arcade Comes to Boston

With .conf25 right around the corner in Boston, there’s a lot to look forward to — inspiring keynotes, ...

Manual Instrumentation with Splunk Observability Cloud: How to Instrument Frontend ...

Although it might seem daunting, as we’ve seen in this series, manual instrumentation can be straightforward ...

Take Action Automatically on Splunk Alerts with Red Hat Ansible Automation Platform

 Are you ready to revolutionize your IT operations? As digital transformation accelerates, the demand for ...