Splunk Search

Comparing Fields and Applying Conditions From Multiple Sourcetypes

goton1160
Explorer

Hi.  I've been a very basic user of Splunk for a while, but now have a need to perform more advanced searches.  I have two different sourcetypes within the same index.  Examples of the fields are below.   

index=vehicles

Sourcetype=autos
VIN
MAKE
MODEL

Sourcetype=cars
SN
MANUFACTURER
PRODUCT

I'd like to search and table VIN, MAKE, MODEL, MANUFACTURER and PRODUCT where - 

VIN=SN
MAKE <> MANUFACTURER
OR
MODEL<>PRODUCT

Basically, where VIN and SN match, if one or both of the other fields don't match, show me.

I'm not sure if a join (VIN and SN) statement is the best approach in this case.  I've researched and found questions and answers related to searching and comparing multiple sourcetypes.  But, I've been unable to find examples that include conditions.  Any suggestions you can provide would be greatly appreciated.

Thank you!

Labels (5)
0 Karma
1 Solution

bowesmana
SplunkTrust
SplunkTrust

It's pretty straightforward to do that

| makeresults format=csv data="VIN,MAKE,MODEL
1234ABCD,FORD,GT
ABCD1234,DODGE,VIPER
1A2B3C4D,CHEVROLET,CORVETTE
A1B2C3D4,AUDI,"
| eval sourcetype="autos"
| append [ 
| makeresults format=csv data="SN,MANUFACTURER,PRODUCT
1234ABCD,FORD,GT
ABCD1234,DODGE,CARAVAN
1A2B3C4D,CHEVY,CORVETTE
A1B2C3D4, ,A8"
  | eval sourcetype="cars"
]
``` Above is sample data setup, but imagine your data above has come from
    index=your_index sourcetype=autos OR sourcetype=cars
```
``` Now use VIN as the common field - there are actually many ways to do
    the same thing, but what you are doing here is to make the dc_XXX fields
    ones to be counted for uniqueness.
```
| eval VIN=coalesce(VIN, SN), dc_makes=coalesce(MAKE, MANUFACTURER), dc_models=coalesce(MODEL, PRODUCT)
``` Here there stats values collects all the original data - you may want
    to add a | fields statement here to limit to the fields you want
    It also counts the unique values of the dc_* fields which is the make
    and model from whichever sourcetype ```
| stats values(*) as * dc(dc_*) as dc_* by VIN
``` And now this will find your mismatch items ```
| where dc_makes>1 OR dc_models>1
| fields - sourcetype dc_*

Hope this helps

View solution in original post

bowesmana
SplunkTrust
SplunkTrust

Here is a very simple example of "joining" two different datasets together based on their common ID. Almost all of the example is just setting up some example data. What you really need are the last 3 lines.

If you paste this to a search window it will randomly return you some results if the PRODUCT contains MISMATCH - if you remove the last line of the example you will all results of the made up data.

| makeresults
| fields - _time

``` Make some data for Sourcetype=autos ```
| eval sourcetype="autos"
| eval MAKE=split("Audi,Porsche,Mercedes",",")
| mvexpand MAKE
| eval MODEL=case(MAKE="Audi", split("AU-123,AU-988", ","), MAKE="Porsche", split("PO-123,PO-988", ","), MAKE="Mercedes", split("MX-123,MX-988", ","))
| mvexpand MODEL
| eval VIN=case(MAKE="Audi", split("AU-VIN:12345678,AU-VIN:9876543", ","), MAKE="Porsche", split("PO-VIN:12345678,PO-VIN:9876543", ","), MAKE="Mercedes", split("MX-VIN:12345678,MX-VIN:9876543", ","))
| mvexpand VIN
| eval VIN=MODEL.":".VIN


``` Make some identical data for Sourcetype=autos ```
| append [
  | makeresults
  | fields - _time
  | eval sourcetype="cars"
  | eval MANUFACTURER=split("Audi,Porsche,Mercedes",",")
  | mvexpand MANUFACTURER
  | eval PRODUCT=case(MANUFACTURER="Audi", split("AU-123,AU-988", ","), MANUFACTURER="Porsche", split("PO-123,PO-988", ","), MANUFACTURER="Mercedes", split("MX-123,MX-988", ","))
  | mvexpand PRODUCT
  | eval SN=case(MANUFACTURER="Audi", split("AU-VIN:12345678,AU-VIN:9876543", ","), MANUFACTURER="Porsche", split("PO-VIN:12345678,PO-VIN:9876543", ","), MANUFACTURER="Mercedes", split("MX-VIN:12345678,MX-VIN:9876543", ","))
  | mvexpand SN
  | eval SN=PRODUCT.":".SN
  | eval PRODUCT=PRODUCT.if(random() % 100 < 10, "-MISMATCH", "")
]
``` Take the common field ```
| eval COMMON_ID=if(sourcetype="autos", VIN, SN)
| stats values(*) as * by COMMON_ID
| where MAKE!=MANUFACTURER OR MODEL!=PRODUCT

Don't ever consider JOIN as the first option - it's not a Splunk way of doing things and has numerous limitations. Splunk uses stats ... BY COMMON_FIELD.

Hope this helps

0 Karma

goton1160
Explorer

Actually, the further I review this, the more confused I get.   In your example, why did you split makes and models?  Is it necessary to append data from one sourcetype to the other?  I assume so, otherwise the where command would be invalid.

You're right, though.  The last three commands are key to the search.  As powerful as Splunk is, I'd sure think there's a much simpler process to search multiple sourcetypes with conditions applied.  (There truly is no comparison between the two, but I could create this query using Access in about 30 seconds.  However, the amount of data I'm searching is far too large for Access...)

Thanks for any feedback you can provide.

0 Karma

bowesmana
SplunkTrust
SplunkTrust

The reason for setting up the example data in that way is based on my understanding of your description of the problem.

Generally the easiest way to give advice is for you to post an example of the data from both types and demonstrate what you want to achieve with the output.

No you don't need to append - the whole makeresults/append section is about setting up an example data set to show how you go about joining the two.

If you can post an example of the two data sources, it would be easier to show how it should be done.

 

0 Karma

goton1160
Explorer

Thanks for the response, Bowesmana.  Understood.

Here are sourcetypes and field data examples.

Sourcetype=autos

VINMAKEMODEL
1234ABCDFORDGT
1A2B3C4DCHEVROLETCORVETTE
ABCD1234DODGEVIPER
A12B3C4DAUDI 

 

Sourcetype=cars

SNMANUFACTURERPRODUCT
1234ABCDFORDGT
ABCD1234CHEVYCORVETTE
1A2B3C4DDODGECARAVAN
A1B2C3D4 A8

 

I'd like to compare the two sourcetypes and see the results where VIN=SN, but MAKE!=MANUFACTURER OR MODEL!=PRODUCT. (Caveat - if any events in either sourcetype contain a null value, they can be ignored/excluded by the search.)

From the example data above, ideally the search would display the following fields, and results would contain these two events (because VIN and SN match, but "CHEVROLET" does not equal "CHEVY", and "VIPER" does not equal "CARAVAN").  

VINMAKEMODELSNMANUFACTURERPRODUCT
1A2B3C4DCHEVROLETCORVETTE1A2B3C4DCHEVYCORVETTE
ABCD1234DODGEVIPERABCD1234DODGECARAVAN

 

Hope this helps to clarify.  Please let me know if you have any questions or suggestions.  I appreciate your help!

0 Karma

goton1160
Explorer

Ugh....sorry.  I modified data in the examples as I was typing my last response, and didn't update each "table" as needed.  Here are correct values.  Sorry for the confusion!  I didn't see an option to edit or delete my last response.

Sourcetype=autos

VINMAKEMODEL
1234ABCDFORDGT
ABCD1234DODGEVIPER
1A2B3C4DCHEVROLETCORVETTE
A1B2C3D4AUDI 

 

Sourcetype=cars

SNMANUFACTURERPRODUCT
1234ABCDFORDGT
ABCD1234DODGECARAVAN
1A2B3C4DCHEVYCORVETTE
A1B2C3D4 A8

 

I'd like to compare the two sourcetypes and see the results where VIN=SN, but MAKE!=MANUFACTURER OR MODEL!=PRODUCT. (Caveat - if any events in either sourcetype contain a null value, they can be ignored/excluded by the search.)

From the example data above, ideally the search would display the following fields, and results would contain these two events (because VIN and SN match, but "VIPER" does not equal "CARAVAN", and "CHEVROLET" does not equal "CHEVY").  

VINMAKEMODELSNMANUFACTURERPRODUCT
ABCD1234DODGEVIPERABCD1234DODGECARAVAN
1A2B3C4DCHEVROLETCORVETTE1A2B3C4DCHEVYCORVETTE

 

Sorry again for the confusion.

0 Karma

bowesmana
SplunkTrust
SplunkTrust

It's pretty straightforward to do that

| makeresults format=csv data="VIN,MAKE,MODEL
1234ABCD,FORD,GT
ABCD1234,DODGE,VIPER
1A2B3C4D,CHEVROLET,CORVETTE
A1B2C3D4,AUDI,"
| eval sourcetype="autos"
| append [ 
| makeresults format=csv data="SN,MANUFACTURER,PRODUCT
1234ABCD,FORD,GT
ABCD1234,DODGE,CARAVAN
1A2B3C4D,CHEVY,CORVETTE
A1B2C3D4, ,A8"
  | eval sourcetype="cars"
]
``` Above is sample data setup, but imagine your data above has come from
    index=your_index sourcetype=autos OR sourcetype=cars
```
``` Now use VIN as the common field - there are actually many ways to do
    the same thing, but what you are doing here is to make the dc_XXX fields
    ones to be counted for uniqueness.
```
| eval VIN=coalesce(VIN, SN), dc_makes=coalesce(MAKE, MANUFACTURER), dc_models=coalesce(MODEL, PRODUCT)
``` Here there stats values collects all the original data - you may want
    to add a | fields statement here to limit to the fields you want
    It also counts the unique values of the dc_* fields which is the make
    and model from whichever sourcetype ```
| stats values(*) as * dc(dc_*) as dc_* by VIN
``` And now this will find your mismatch items ```
| where dc_makes>1 OR dc_models>1
| fields - sourcetype dc_*

Hope this helps

goton1160
Explorer

That works great!  Just what I was looking for.

Thanks much for your support, bowesmana!

0 Karma

goton1160
Explorer

Thanks for your response, bowesmana!  You've got me headed in the right direction.

0 Karma
Get Updates on the Splunk Community!

What’s New & Next in Splunk SOAR

Security teams today are dealing with more alerts, more tools, and more pressure than ever.  Join us on ...

Your Voice Matters! Help Us Shape the New Splunk Lantern Experience

Splunk Lantern is a Splunk customer success center that provides advice from Splunk experts on valuable data ...

September Community Champions: A Shoutout to Our Contributors!

As we close the books on another fantastic month, we want to take a moment to celebrate the people who are the ...