Splunk Search

What is 'source' supposed to show after a 'join' command?

lbogle
Contributor

Hello Splunkers,

Just checking in to get a proof read and also see what the expected result in 'source' is supposed to be when referencing two OR'd and separate sources combined with a "join" or "selfjoin" command.
I have the following query:

index=index-name (source="/file.csv" OR source="/file2.csv") | eval name=lower(coalesce('System Name',Name)) | eval os=coalesce('OS Name','Operating System')| fields + * | selfjoin max=0 keepsingle=t name .....and then some other things.

My understanding is that this search will take events and mash together the ones with matching fields (in this case 'System Name' and Name) and include the other lines with no matching fields, but in their original event format. Is that accurate?

What should I see in source at this point?

Thanks!

Tags (3)
0 Karma

martin_mueller
SplunkTrust
SplunkTrust

The short answer is "either file or file2". When two events are selfjoined, one field value will overwrite the other. Which is which depends on which is which, if you catch my drift. source is just a field like any other.

What's the underlying use case for this search? Often there are better, more splunky ways of doing things that joining.

0 Karma

martin_mueller
SplunkTrust
SplunkTrust

Wrapping all events together for a common (set of) field(s) is kinda what stats does. If you want all your values mashed together you can do this:

index=index-name (source="/file.csv" OR source="/file2.csv") | eval mumbojumbo | stats values(*) as * by name

lbogle
Contributor

Hi Martin,

Bummer! I was told that everything that matched got joined together into a single event. I'm glad I double checked.
So the source is two .csv files. It's weird to me that only one shows up after the join.

The plan was to have two host files. Both files have some of the same host names but not all. One of the .csv files has many other fields by which to categorize the host names, whereas the other does not. I need to remove some host names from the list based on these categorization fields.

Initially I had set about doing this with nested sub searches but that didn't quite work out and it seemed to be very inefficient. I think I also ran into a limits.conf issue as the base query was returning a couple hundred entries.

This way seemed to be a little more elegant.

What are your thoughts?

0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...