What is 'source' supposed to show after a 'join' c...

lbogle · ‎10-29-2014

Hello Splunkers,

Just checking in to get a proof read and also see what the expected result in 'source' is supposed to be when referencing two OR'd and separate sources combined with a "join" or "selfjoin" command.
I have the following query:

index=index-name (source="/file.csv" OR source="/file2.csv") | eval name=lower(coalesce('System Name',Name)) | eval os=coalesce('OS Name','Operating System')| fields + * | selfjoin max=0 keepsingle=t name .....and then some other things.

My understanding is that this search will take events and mash together the ones with matching fields (in this case 'System Name' and Name) and include the other lines with no matching fields, but in their original event format. Is that accurate?

What should I see in source at this point?

Thanks!

martin_mueller · ‎10-29-2014

The short answer is "either file or file2". When two events are selfjoined, one field value will overwrite the other. Which is which depends on which is which, if you catch my drift. source is just a field like any other.

What's the underlying use case for this search? Often there are better, more splunky ways of doing things that joining.

martin_mueller · ‎10-30-2014

Wrapping all events together for a common (set of) field(s) is kinda what stats does. If you want all your values mashed together you can do this:

index=index-name (source="/file.csv" OR source="/file2.csv") | eval mumbojumbo | stats values(*) as * by name

lbogle · ‎10-29-2014

Hi Martin,

Bummer! I was told that everything that matched got joined together into a single event. I'm glad I double checked.
So the source is two .csv files. It's weird to me that only one shows up after the join.

The plan was to have two host files. Both files have some of the same host names but not all. One of the .csv files has many other fields by which to categorize the host names, whereas the other does not. I need to remove some host names from the list based on these categorization fields.

Initially I had set about doing this with nested sub searches but that didn't quite work out and it seemed to be very inefficient. I think I also ran into a limits.conf issue as the base query was returning a couple hundred entries.

This way seemed to be a little more elegant.

What are your thoughts?

What is 'source' supposed to show after a 'join' command?

Index This | What is broken 80% of the time by February?

Unlock Faster Time-to-Value on Edge and Ingest Processor with New SPL2 Pipeline ...

Splunk MCP & Agentic AI: Machine Data Without Limits

Join the Conversation

What is 'source' supposed to show after a 'join' command?

Index This | What is broken 80% of the time by February?

Unlock Faster Time-to-Value on Edge and Ingest Processor with New SPL2 Pipeline ...

Splunk MCP & Agentic AI: Machine Data Without Limits