Solved: Comparing Fields and Applying Conditions From Mult...

goton1160 · ‎05-02-2024

Hi. I've been a very basic user of Splunk for a while, but now have a need to perform more advanced searches. I have two different sourcetypes within the same index. Examples of the fields are below.

index=vehicles

Sourcetype=autos
VIN
MAKE
MODEL

Sourcetype=cars
SN
MANUFACTURER
PRODUCT

I'd like to search and table VIN, MAKE, MODEL, MANUFACTURER and PRODUCT where -

VIN=SN
MAKE <> MANUFACTURER
OR
MODEL<>PRODUCT

Basically, where VIN and SN match, if one or both of the other fields don't match, show me.

I'm not sure if a join (VIN and SN) statement is the best approach in this case. I've researched and found questions and answers related to searching and comparing multiple sourcetypes. But, I've been unable to find examples that include conditions. Any suggestions you can provide would be greatly appreciated.

Thank you!

bowesmana · ‎05-26-2024

It's pretty straightforward to do that

| makeresults format=csv data="VIN,MAKE,MODEL
1234ABCD,FORD,GT
ABCD1234,DODGE,VIPER
1A2B3C4D,CHEVROLET,CORVETTE
A1B2C3D4,AUDI,"
| eval sourcetype="autos"
| append [ 
| makeresults format=csv data="SN,MANUFACTURER,PRODUCT
1234ABCD,FORD,GT
ABCD1234,DODGE,CARAVAN
1A2B3C4D,CHEVY,CORVETTE
A1B2C3D4, ,A8"
  | eval sourcetype="cars"
]
``` Above is sample data setup, but imagine your data above has come from
    index=your_index sourcetype=autos OR sourcetype=cars
```
``` Now use VIN as the common field - there are actually many ways to do
    the same thing, but what you are doing here is to make the dc_XXX fields
    ones to be counted for uniqueness.
```
| eval VIN=coalesce(VIN, SN), dc_makes=coalesce(MAKE, MANUFACTURER), dc_models=coalesce(MODEL, PRODUCT)
``` Here there stats values collects all the original data - you may want
    to add a | fields statement here to limit to the fields you want
    It also counts the unique values of the dc_* fields which is the make
    and model from whichever sourcetype ```
| stats values(*) as * dc(dc_*) as dc_* by VIN
``` And now this will find your mismatch items ```
| where dc_makes>1 OR dc_models>1
| fields - sourcetype dc_*

Hope this helps

View solution in original post

bowesmana · ‎05-02-2024

Here is a very simple example of "joining" two different datasets together based on their common ID. Almost all of the example is just setting up some example data. What you really need are the last 3 lines.

If you paste this to a search window it will randomly return you some results if the PRODUCT contains MISMATCH - if you remove the last line of the example you will all results of the made up data.

| makeresults
| fields - _time

``` Make some data for Sourcetype=autos ```
| eval sourcetype="autos"
| eval MAKE=split("Audi,Porsche,Mercedes",",")
| mvexpand MAKE
| eval MODEL=case(MAKE="Audi", split("AU-123,AU-988", ","), MAKE="Porsche", split("PO-123,PO-988", ","), MAKE="Mercedes", split("MX-123,MX-988", ","))
| mvexpand MODEL
| eval VIN=case(MAKE="Audi", split("AU-VIN:12345678,AU-VIN:9876543", ","), MAKE="Porsche", split("PO-VIN:12345678,PO-VIN:9876543", ","), MAKE="Mercedes", split("MX-VIN:12345678,MX-VIN:9876543", ","))
| mvexpand VIN
| eval VIN=MODEL.":".VIN


``` Make some identical data for Sourcetype=autos ```
| append [
  | makeresults
  | fields - _time
  | eval sourcetype="cars"
  | eval MANUFACTURER=split("Audi,Porsche,Mercedes",",")
  | mvexpand MANUFACTURER
  | eval PRODUCT=case(MANUFACTURER="Audi", split("AU-123,AU-988", ","), MANUFACTURER="Porsche", split("PO-123,PO-988", ","), MANUFACTURER="Mercedes", split("MX-123,MX-988", ","))
  | mvexpand PRODUCT
  | eval SN=case(MANUFACTURER="Audi", split("AU-VIN:12345678,AU-VIN:9876543", ","), MANUFACTURER="Porsche", split("PO-VIN:12345678,PO-VIN:9876543", ","), MANUFACTURER="Mercedes", split("MX-VIN:12345678,MX-VIN:9876543", ","))
  | mvexpand SN
  | eval SN=PRODUCT.":".SN
  | eval PRODUCT=PRODUCT.if(random() % 100 < 10, "-MISMATCH", "")
]
``` Take the common field ```
| eval COMMON_ID=if(sourcetype="autos", VIN, SN)
| stats values(*) as * by COMMON_ID
| where MAKE!=MANUFACTURER OR MODEL!=PRODUCT

Don't ever consider JOIN as the first option - it's not a Splunk way of doing things and has numerous limitations. Splunk uses stats ... BY COMMON_FIELD.

Hope this helps

goton1160 · ‎05-15-2024

Actually, the further I review this, the more confused I get. In your example, why did you split makes and models? Is it necessary to append data from one sourcetype to the other? I assume so, otherwise the where command would be invalid.

You're right, though. The last three commands are key to the search. As powerful as Splunk is, I'd sure think there's a much simpler process to search multiple sourcetypes with conditions applied. (There truly is no comparison between the two, but I could create this query using Access in about 30 seconds. However, the amount of data I'm searching is far too large for Access...)

Thanks for any feedback you can provide.

bowesmana · ‎05-20-2024

The reason for setting up the example data in that way is based on my understanding of your description of the problem.

Generally the easiest way to give advice is for you to post an example of the data from both types and demonstrate what you want to achieve with the output.

No you don't need to append - the whole makeresults/append section is about setting up an example data set to show how you go about joining the two.

If you can post an example of the two data sources, it would be easier to show how it should be done.

goton1160 · ‎05-21-2024

Thanks for the response, Bowesmana. Understood.

Here are sourcetypes and field data examples.

Sourcetype=autos

VIN	MAKE	MODEL
1234ABCD	FORD	GT
1A2B3C4D	CHEVROLET	CORVETTE
ABCD1234	DODGE	VIPER
A12B3C4D	AUDI

Sourcetype=cars

SN	MANUFACTURER	PRODUCT
1234ABCD	FORD	GT
ABCD1234	CHEVY	CORVETTE
1A2B3C4D	DODGE	CARAVAN
A1B2C3D4		A8

I'd like to compare the two sourcetypes and see the results where VIN=SN, but MAKE!=MANUFACTURER OR MODEL!=PRODUCT. (Caveat - if any events in either sourcetype contain a null value, they can be ignored/excluded by the search.)

From the example data above, ideally the search would display the following fields, and results would contain these two events (because VIN and SN match, but "CHEVROLET" does not equal "CHEVY", and "VIPER" does not equal "CARAVAN").

VIN	MAKE	MODEL	SN	MANUFACTURER	PRODUCT
1A2B3C4D	CHEVROLET	CORVETTE	1A2B3C4D	CHEVY	CORVETTE
ABCD1234	DODGE	VIPER	ABCD1234	DODGE	CARAVAN

Hope this helps to clarify. Please let me know if you have any questions or suggestions. I appreciate your help!

goton1160 · ‎05-21-2024

Ugh....sorry. I modified data in the examples as I was typing my last response, and didn't update each "table" as needed. Here are correct values. Sorry for the confusion! I didn't see an option to edit or delete my last response.

Sourcetype=autos

VIN	MAKE	MODEL
1234ABCD	FORD	GT
ABCD1234	DODGE	VIPER
1A2B3C4D	CHEVROLET	CORVETTE
A1B2C3D4	AUDI

Sourcetype=cars

SN	MANUFACTURER	PRODUCT
1234ABCD	FORD	GT
ABCD1234	DODGE	CARAVAN
1A2B3C4D	CHEVY	CORVETTE
A1B2C3D4		A8

I'd like to compare the two sourcetypes and see the results where VIN=SN, but MAKE!=MANUFACTURER OR MODEL!=PRODUCT. (Caveat - if any events in either sourcetype contain a null value, they can be ignored/excluded by the search.)

From the example data above, ideally the search would display the following fields, and results would contain these two events (because VIN and SN match, but "VIPER" does not equal "CARAVAN", and "CHEVROLET" does not equal "CHEVY").

VIN	MAKE	MODEL	SN	MANUFACTURER	PRODUCT
ABCD1234	DODGE	VIPER	ABCD1234	DODGE	CARAVAN
1A2B3C4D	CHEVROLET	CORVETTE	1A2B3C4D	CHEVY	CORVETTE

Sorry again for the confusion.

bowesmana · ‎05-26-2024

It's pretty straightforward to do that

| makeresults format=csv data="VIN,MAKE,MODEL
1234ABCD,FORD,GT
ABCD1234,DODGE,VIPER
1A2B3C4D,CHEVROLET,CORVETTE
A1B2C3D4,AUDI,"
| eval sourcetype="autos"
| append [ 
| makeresults format=csv data="SN,MANUFACTURER,PRODUCT
1234ABCD,FORD,GT
ABCD1234,DODGE,CARAVAN
1A2B3C4D,CHEVY,CORVETTE
A1B2C3D4, ,A8"
  | eval sourcetype="cars"
]
``` Above is sample data setup, but imagine your data above has come from
    index=your_index sourcetype=autos OR sourcetype=cars
```
``` Now use VIN as the common field - there are actually many ways to do
    the same thing, but what you are doing here is to make the dc_XXX fields
    ones to be counted for uniqueness.
```
| eval VIN=coalesce(VIN, SN), dc_makes=coalesce(MAKE, MANUFACTURER), dc_models=coalesce(MODEL, PRODUCT)
``` Here there stats values collects all the original data - you may want
    to add a | fields statement here to limit to the fields you want
    It also counts the unique values of the dc_* fields which is the make
    and model from whichever sourcetype ```
| stats values(*) as * dc(dc_*) as dc_* by VIN
``` And now this will find your mismatch items ```
| where dc_makes>1 OR dc_models>1
| fields - sourcetype dc_*

Hope this helps

goton1160 · ‎05-29-2024

That works great! Just what I was looking for.

Thanks much for your support, bowesmana!

goton1160 · ‎05-06-2024

Thanks for your response, bowesmana! You've got me headed in the right direction.

Comparing Fields and Applying Conditions From Multiple Sourcetypes

eval

fields

join

stats

table

What’s New & Next in Splunk SOAR

Your Voice Matters! Help Us Shape the New Splunk Lantern Experience

September Community Champions: A Shoutout to Our Contributors!

Are you a member of the Splunk Community?

Comparing Fields and Applying Conditions From Multiple Sourcetypes