Splunk Search

Splunk left join not returning as expected

MrJohn230
Path Finder

I have the below code. I know that values exist under the subsearch which are not returning when I run the below query. However, when I uncomment the "where clause" in the sub search the values appear. I don't know what I have done incorrectly for my results to not show. I've also commented out the |search and it still doesn't show that these values exist in the subsearch. 

Any help would be appreciated. 

index=customer name IN (gate-green, gate-blue) msg="*First time: *"
    | rex field=msg "First time: (?<UserId>\d+)"
    | eval FirstRequest = 1
    | join type=left UserId
     [search index=customer name IN (cust-blue, cust-green) msg="*COMPLETED *"
       | rex field=msg "Message\|[^\t\{]*(?<json>{[^\t]+})"
       | spath input=json path=infoId output=UserId
       | eval Completed = 1
       ```| where UserId IN (125,999,418,208)```]
| table UserId, Completed
| search UserId IN (125,999,418,208)

 

Labels (4)
0 Karma
1 Solution

gcusello
SplunkTrust
SplunkTrust

Hi @MrJohn230 ,

at first, if possible try to avoid to use join command!

I understand that all of us arrive from SQL, but Splunk isn't a database so join command should be avoided all the times it's possible and replaced e.g. with the stats command, because it's a very slow and resource eater command.

e.g. try something like this (obviously I cannot check it):

index=customer ((name IN (gate-green, gate-blue) msg="*First time: *") OR name IN (cust-blue, cust-green) msg="*COMPLETED *")
| rex field=msg "First time: (?<UserId>\d+)"
| rex field=msg "Message\|[^\t\{]*(?<json>{[^\t]+})"
| spath input=json path=infoId output=UserId
| eval status=if(name IN (gate-green, gate-blue) AND msg="*First time: *","FirstRequest","Completed")
| stats dc(status) AS status_count values(status) AS status BY UserId
| eval status=if(status_count=2,"both",status)
| table UserId status
| search UserId IN (125,999,418,208)

Then you can define if to maintain all the UserIds or only the ones with both the statuses.

About your search, try to use quotes in the IN values.

Ciao.

Giuseppe

View solution in original post

PickleRick
SplunkTrust
SplunkTrust

One additional remark about your searches (both the outer one and the subsearch) - don't use wildcards at the beginning of your search term if you can avoid it.

0 Karma

bowesmana
SplunkTrust
SplunkTrust

Your where clause is wrong - it does not support the IN construct, like search. You could do it with

| where in(UserId,125,999,418,208)

or using search IN. As @gcusello says, using join and subsearches is not a good habit, using stats can normall do the same and does not have limitations that join/subsearch has, e.g. this is an example of using stats.

index=customer (name IN (gate-green, gate-blue) msg="*First time: *") OR (name IN (cust-blue, cust-green) msg="*COMPLETED *")
| rex field=msg "First time: (?<UserId>\d+)"
| eval FirstRequest = if(isnotnull(UserId),1,null())
| rex field=msg "Message\|[^\t\{]*(?<json>{[^\t]+})"
| spath input=json path=infoId output=CompletedUserId
| eval Completed = if(isnotnull(CompletedUserId), 1, null())
| eval UserId=coalesce(UserId, CompletedUserId)
| stats values(Completed) as Completed by UserId
| search UserId IN (125,999,418,208)

isoutamo
SplunkTrust
SplunkTrust

One more vote to not to use join or union. It usually have more issues than it solves.

Here is one old post how to replace join with stats with different join types https://community.splunk.com/t5/Splunk-Search/What-is-the-relation-between-the-Splunk-inner-left-joi...

There are also many .conf presentations why you should use stats instead of join like https://conf.splunk.com/watch/conf-online.html?search=join%20without%20join#/

r. Ismo

MrJohn230
Path Finder

Hello @gcusello I managed to not use the join command, the above feedback helped. 

I decided to go with Union this seems to be working, what do you think?

index=customer ((name IN (gate-green, gate-blue) msg="*First time: *")
| rex field=msg "First time: (?<UserId>\d+)"
| eval FirtRequest = 1
|union
[ search index name IN (cust-blue, cust-green) msg="*COMPLETED *")
   | rex field=msg "Message\|[^\t\{]*(?<json>{[^\t]+})"
   | spath input=json path=infoId output=UserId
   | eval Completed = 1]
| stats max(FirstRequest) AS FirstRequest max(Completed) BY UserId

 

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @MrJohn230 ,

I continue to prefer the solution I hinted because union is very similar to join and maintain the ame limit of 50,000 results like all the subsearches.

If it solves your requirement, use it, but I hint to tale practice using stats that's better and faster.

did you solved the initial issue of the eval?

Ciao.

Giuseppe

gcusello
SplunkTrust
SplunkTrust

Hi @MrJohn230 ,

at first, if possible try to avoid to use join command!

I understand that all of us arrive from SQL, but Splunk isn't a database so join command should be avoided all the times it's possible and replaced e.g. with the stats command, because it's a very slow and resource eater command.

e.g. try something like this (obviously I cannot check it):

index=customer ((name IN (gate-green, gate-blue) msg="*First time: *") OR name IN (cust-blue, cust-green) msg="*COMPLETED *")
| rex field=msg "First time: (?<UserId>\d+)"
| rex field=msg "Message\|[^\t\{]*(?<json>{[^\t]+})"
| spath input=json path=infoId output=UserId
| eval status=if(name IN (gate-green, gate-blue) AND msg="*First time: *","FirstRequest","Completed")
| stats dc(status) AS status_count values(status) AS status BY UserId
| eval status=if(status_count=2,"both",status)
| table UserId status
| search UserId IN (125,999,418,208)

Then you can define if to maintain all the UserIds or only the ones with both the statuses.

About your search, try to use quotes in the IN values.

Ciao.

Giuseppe

Get Updates on the Splunk Community!

Earn a $35 Gift Card for Answering our Splunk Admins & App Developer Survey

Survey for Splunk Admins and App Developers is open now! | Earn a $35 gift card!      Hello there,  Splunk ...

Continuing Innovation & New Integrations Unlock Full Stack Observability For Your ...

You’ve probably heard the latest about AppDynamics joining the Splunk Observability portfolio, deepening our ...

Monitoring Amazon Elastic Kubernetes Service (EKS)

As we’ve seen, integrating Kubernetes environments with Splunk Observability Cloud is a quick and easy way to ...