Splunk Search

How can i correlate 2 fields with different sourcetype in one table?

i17065
Engager

I have 2 searches for systems & folders.
Both searches return a table.
The fields systemID & folderID have the same values.
(systemID = folderID)
I want the foldername as a column correlated with the systemID in the systems table.
Is this possible?

Thank you in advance!

index=a
sourcetype="systems"
system_name
systemID
field3

index=a
sourcetype="folders"
foldername
folderID
field3

0 Karma

rmmiller
Contributor

Answer:

index="vmware-inv" (sourcetype=vmware:inv:vm OR sourcetype=vmware:inv:hierarchy)
| fields + _time, changeSet.name, moid, changeSet.summary.runtime.host.name, changeSet.storage.perDatastoreUsage{}.committed, changeSet.config.name, vm_name, mem_capacity, logical_cpu_count, vm_os, hypervisor_name, cluster_name, host, hypervisor_os_version, changeSet.summary.runtime.powerState, changeSet.summary.vm.moid, changeSet.parent.moid
| eval folder=if(NOT match('changeSet.name',"Failover") AND match('changeSet.name',".\s\-\s."),'changeSet.name',null())
| eval changeSet.parent.moid = if(isnull(folder),'changeSet.parent.moid',moid)
| eval folder_moid = if(isnull(folder),null(),moid)
| eval moid = if(isnull(folder),moid,null())
| eval changeSet.name = if(isnull(folder),'changeSet.name',null())
| stats values(*) AS * BY changeSet.parent.moid
| where NOT isnull(folder) AND NOT isnull('changeSet.name')

Explanation:

Step 1: Construct your search so all events of interest are returned at once. We'll deal the folders from the systems later. You can still use fields if you want, but might not be necessary since you'll get getting away from join and using stats instead.

index="vmware-inv" (sourcetype=vmware:inv:vm OR sourcetype=vmware:inv:hierarchy)
| fields + _time, changeSet.name, moid, changeSet.summary.runtime.host.name, changeSet.storage.perDatastoreUsage{}.committed, changeSet.config.name, vm_name, mem_capacity, logical_cpu_count, vm_os, hypervisor_name, cluster_name, host, hypervisor_os_version, changeSet.summary.runtime.powerState, changeSet.summary.vm.moid, changeSet.parent.moid

*Note: I added changeSet.name and moid to your original list of fields from your system-specific query. Again, this might not be necessary anymore since you're being much nicer to your search head with this approach. 🙂

Step 2: Identify folders by the value of the changeSet.name field

| eval folder=if(NOT match('changeSet.name',"Failover") AND match('changeSet.name',".\s\-\s."),'changeSet.name',null())

The regular expressions here correspond to your folder-specific search in your other question. Search results that match appropriately will have a non-null value in the folder column consisting of the folder's changeSet.name value. Systems will have a null value in this column. Presto! Folders and systems distinguished.

Step 3: Adjust fields for folders and systems to line up stats for success

| eval changeSet.parent.moid = if(isnull(folder),'changeSet.parent.moid',moid)
| eval folder_moid = if(isnull(folder),null(),moid)
| eval moid = if(isnull(folder),moid,null())
| eval changeSet.name = if(isnull(folder),'changeSet.name',null())

For folders, do 4 essential things:

  1. Set the value of the folder's 'changeSet.parent.moid' field to be equal to its moid field. We will use this column with stats to match systems and folders later.
  2. Create a new column named folder_moid using the value of the moid. Useful for understanding logic later, but might not be needed in your end result.
  3. Set the value of the moid column to null. This will help to prevent creation of multivalue fields when we use stats.
  4. Set the value of the changeSet.name column to null. This will help to prevent creation of multivalue fields when we use stats.

Note: There's only 1 effective change to systems, and that's a null folder_moid field.

Step 4: Use stats to roll up the data for same values of changeSet.parent.moid

| stats values(*) AS * BY changeSet.parent.moid

Remember when we set the changeSet.parent.moid field of folders to its moid? That's what lets this do the heavy-lifting of matching up your systems and folders.

Step 5: Almost there! Remove any systems without matching folders as well as any folders without matching systems

| where NOT isnull(folder) AND NOT isnull('changeSet.name')

You should be left with only matched systems and folders that meet your original criteria.



Mocked-up data:

I mocked-up some data for developing this answer. Because I use random numbers to generate my data, I had some anomalies you might not have (like a multivalue fields in places you might not expect). This causes the final result to look off, but it's only an artifact of the randomly generated data.

| makeresults count=100
| eval sysnames=split("Foo Failover Bar,Bar,Foo - Bar,Bar - Failover - Bar - Foo,Far,Boo,BooFar,FarBoo",",")
| eval changeSet.name=mvindex(sysnames,random()%8)
| table changeSet.name
| eval changeSet.parent.moid = if(NOT match('changeSet.name',"Failover") AND match('changeSet.name',".\s\-\s."),null(),random()%40)
| eval moid=random()%40
| rename COMMENT AS "**"
| rename COMMENT AS "Above this line just replicates minimal data to develop logic"
| rename COMMENT AS "**"
| rename COMMENT AS "First create the folder columnn if it really is a folder which we do by a regex match on the changeSet.name field"
| eval folder=if(NOT match('changeSet.name',"Failover") AND match('changeSet.name',".\s\-\s."),'changeSet.name',null())
| rename COMMENT AS "Retain value of changeSet.parent.moid if this is a system but set this value to the folder moid if it is a folder"
| eval changeSet.parent.moid = if(isnull(folder),'changeSet.parent.moid',moid)
| rename COMMENT AS "Create a folder_moid columnn for easier visualization but might not be needed in end result"
| eval folder_moid = if(isnull(folder),null(),moid)
| rename COMMENT AS "Blank out the moid column for folders to help prevent mvfield creation with stats"
| eval moid = if(isnull(folder),moid,null())
| rename COMMENT AS "Blank out the changeSet.name column for folderes to help prevent mvfield creation with stats"
| eval changeSet.name = if(isnull(folder),'changeSet.name',null())
| rename COMMENT AS "The hardest working line in the whole SPL"
| rename COMMENT AS "Merges all the lines based on the changeSet.parent.moid values"
| stats values(*) AS * BY changeSet.parent.moid
| rename COMMENT AS "Remove any systems without a matching folder and any folders without a matching system thereby replicating an inner join"
| where NOT isnull(folder) AND NOT isnull('changeSet.name')
| rename COMMENT AS "**"
| rename COMMENT AS "Below this line removes bogus multivalue changeSet.name and moid data due to use of random"
| where mvcount('changeSet.name')==1 and mvcount(moid)==1
| rename COMMENT AS "**"

If you're not sure about the logic or want to understand it a little better, copy from the start of this SPL to the line that reads "Above this line just replicates minimal data to develop logic". Then start adding lines one at a time to see how the results change. If you do this, I found it useful to replace the where commands with an eval so I could make sure the logic was right. For example you could replace this

| where NOT isnull(folder) AND NOT isnull('changeSet.name')

with this:

| eval IGNOREWITHWHERE_SINCENOFOLDERMATCH = if(NOT isnull(folder),0,1)
| eval IGNOREWITHWHERE_SINCENOSYSTEMMATCH = if(NOT isnull('changeSet.name'),0,1)

Hope that helps!
rmmiller

0 Karma

rmmiller
Contributor

@i17065 did this solve your problem?

0 Karma

i17065
Engager

First at all thank you for your help!
The problem is that the sourcetypes systems and folders use the same fieldname "changeSet.name", "moid" but with there own values.
So when i use "fields +" i get only results from sourcetype systems ("changeSet.name", "moid").
Is there a way to use fields from a specific sourcetype in the same search? For example: | fields + changeSet.name(sourcetype=folders) to get results from folders?

0 Karma

rmmiller
Contributor

I don't see where these other field names are coming in. They weren't part of your original query. Is this a duplicate of your other question here? https://answers.splunk.com/answers/813580/how-can-i-create-a-key-for-two-searches-and-combin.html

0 Karma

i17065
Engager

Hi rmmiller, yes it is a duplicate of my other question, because i thought that my other question was not uploaded (question was gone for a couple of hours). Therefore I have tried to simplify my problem to make it more understandable. When I tried to implement your suggestion, the knot got a little loose in my head. Then I noticed that the field names were causing problems...

0 Karma

rmmiller
Contributor

OK, after taking a careful look at your other question, I think the approach is still largely the same.
1) Pull all results back in a single query
2) Use an eval statement to distinguish folders from systems.
3) Use stats to aggregate/correlate your systems and folders events.

Amending my answer now.

0 Karma

richgalloway
SplunkTrust
SplunkTrust

Perhaps this will help.

index=a (sourcetype="systems" OR sourcetype="folders")
| eval system = coalesce(systemID, folderID)
| stats values(*) as * by system
---
If this reply helps you, Karma would be appreciated.

i17065
Engager

First at all thank you for your help!
The problem is that the sourcetypes systems and folders use the same fieldname "changeSet.name", "moid" but with there own values.
So when i use "fields +" i get only results from sourcetype systems ("changeSet.name", "moid").
Is there a way to use fields from a specific sourcetype in the same search? For example: | fields + changeSet.name(sourcetype=folders) to get results from folders?

0 Karma
Get Updates on the Splunk Community!

Check out This Month’s Brand new Splunk Lantern Articles

Splunk Lantern is a customer success center providing advice from Splunk experts on valuable data insights, ...

Happy CX Day to our Community Superheroes!

Happy 10th Birthday CX Day!What is CX Day? It’s a global celebration recognizing innovation and success in the ...

Routing Data to Different Splunk Indexes in the OpenTelemetry Collector

This blog post is part of an ongoing series on OpenTelemetry. The OpenTelemetry project is the second largest ...