Hi. I've been a very basic user of Splunk for a while, but now have a need to perform more advanced searches. I have two different sourcetypes within the same index. Examples of the fields are below.
index=vehicles
Sourcetype=autos
VIN
MAKE
MODEL
Sourcetype=cars
SN
MANUFACTURER
PRODUCT
I'd like to search and table VIN, MAKE, MODEL, MANUFACTURER and PRODUCT where -
VIN=SN
MAKE <> MANUFACTURER
OR
MODEL<>PRODUCT
Basically, where VIN and SN match, if one or both of the other fields don't match, show me.
I'm not sure if a join (VIN and SN) statement is the best approach in this case. I've researched and found questions and answers related to searching and comparing multiple sourcetypes. But, I've been unable to find examples that include conditions. Any suggestions you can provide would be greatly appreciated.
Thank you!
It's pretty straightforward to do that
| makeresults format=csv data="VIN,MAKE,MODEL
1234ABCD,FORD,GT
ABCD1234,DODGE,VIPER
1A2B3C4D,CHEVROLET,CORVETTE
A1B2C3D4,AUDI,"
| eval sourcetype="autos"
| append [
| makeresults format=csv data="SN,MANUFACTURER,PRODUCT
1234ABCD,FORD,GT
ABCD1234,DODGE,CARAVAN
1A2B3C4D,CHEVY,CORVETTE
A1B2C3D4, ,A8"
| eval sourcetype="cars"
]
``` Above is sample data setup, but imagine your data above has come from
index=your_index sourcetype=autos OR sourcetype=cars
```
``` Now use VIN as the common field - there are actually many ways to do
the same thing, but what you are doing here is to make the dc_XXX fields
ones to be counted for uniqueness.
```
| eval VIN=coalesce(VIN, SN), dc_makes=coalesce(MAKE, MANUFACTURER), dc_models=coalesce(MODEL, PRODUCT)
``` Here there stats values collects all the original data - you may want
to add a | fields statement here to limit to the fields you want
It also counts the unique values of the dc_* fields which is the make
and model from whichever sourcetype ```
| stats values(*) as * dc(dc_*) as dc_* by VIN
``` And now this will find your mismatch items ```
| where dc_makes>1 OR dc_models>1
| fields - sourcetype dc_*
Hope this helps
Here is a very simple example of "joining" two different datasets together based on their common ID. Almost all of the example is just setting up some example data. What you really need are the last 3 lines.
If you paste this to a search window it will randomly return you some results if the PRODUCT contains MISMATCH - if you remove the last line of the example you will all results of the made up data.
| makeresults
| fields - _time
``` Make some data for Sourcetype=autos ```
| eval sourcetype="autos"
| eval MAKE=split("Audi,Porsche,Mercedes",",")
| mvexpand MAKE
| eval MODEL=case(MAKE="Audi", split("AU-123,AU-988", ","), MAKE="Porsche", split("PO-123,PO-988", ","), MAKE="Mercedes", split("MX-123,MX-988", ","))
| mvexpand MODEL
| eval VIN=case(MAKE="Audi", split("AU-VIN:12345678,AU-VIN:9876543", ","), MAKE="Porsche", split("PO-VIN:12345678,PO-VIN:9876543", ","), MAKE="Mercedes", split("MX-VIN:12345678,MX-VIN:9876543", ","))
| mvexpand VIN
| eval VIN=MODEL.":".VIN
``` Make some identical data for Sourcetype=autos ```
| append [
| makeresults
| fields - _time
| eval sourcetype="cars"
| eval MANUFACTURER=split("Audi,Porsche,Mercedes",",")
| mvexpand MANUFACTURER
| eval PRODUCT=case(MANUFACTURER="Audi", split("AU-123,AU-988", ","), MANUFACTURER="Porsche", split("PO-123,PO-988", ","), MANUFACTURER="Mercedes", split("MX-123,MX-988", ","))
| mvexpand PRODUCT
| eval SN=case(MANUFACTURER="Audi", split("AU-VIN:12345678,AU-VIN:9876543", ","), MANUFACTURER="Porsche", split("PO-VIN:12345678,PO-VIN:9876543", ","), MANUFACTURER="Mercedes", split("MX-VIN:12345678,MX-VIN:9876543", ","))
| mvexpand SN
| eval SN=PRODUCT.":".SN
| eval PRODUCT=PRODUCT.if(random() % 100 < 10, "-MISMATCH", "")
]
``` Take the common field ```
| eval COMMON_ID=if(sourcetype="autos", VIN, SN)
| stats values(*) as * by COMMON_ID
| where MAKE!=MANUFACTURER OR MODEL!=PRODUCT
Don't ever consider JOIN as the first option - it's not a Splunk way of doing things and has numerous limitations. Splunk uses stats ... BY COMMON_FIELD.
Hope this helps
Actually, the further I review this, the more confused I get. In your example, why did you split makes and models? Is it necessary to append data from one sourcetype to the other? I assume so, otherwise the where command would be invalid.
You're right, though. The last three commands are key to the search. As powerful as Splunk is, I'd sure think there's a much simpler process to search multiple sourcetypes with conditions applied. (There truly is no comparison between the two, but I could create this query using Access in about 30 seconds. However, the amount of data I'm searching is far too large for Access...)
Thanks for any feedback you can provide.
The reason for setting up the example data in that way is based on my understanding of your description of the problem.
Generally the easiest way to give advice is for you to post an example of the data from both types and demonstrate what you want to achieve with the output.
No you don't need to append - the whole makeresults/append section is about setting up an example data set to show how you go about joining the two.
If you can post an example of the two data sources, it would be easier to show how it should be done.
Thanks for the response, Bowesmana. Understood.
Here are sourcetypes and field data examples.
Sourcetype=autos
VIN | MAKE | MODEL |
1234ABCD | FORD | GT |
1A2B3C4D | CHEVROLET | CORVETTE |
ABCD1234 | DODGE | VIPER |
A12B3C4D | AUDI |
Sourcetype=cars
SN | MANUFACTURER | PRODUCT |
1234ABCD | FORD | GT |
ABCD1234 | CHEVY | CORVETTE |
1A2B3C4D | DODGE | CARAVAN |
A1B2C3D4 | A8 |
I'd like to compare the two sourcetypes and see the results where VIN=SN, but MAKE!=MANUFACTURER OR MODEL!=PRODUCT. (Caveat - if any events in either sourcetype contain a null value, they can be ignored/excluded by the search.)
From the example data above, ideally the search would display the following fields, and results would contain these two events (because VIN and SN match, but "CHEVROLET" does not equal "CHEVY", and "VIPER" does not equal "CARAVAN").
VIN | MAKE | MODEL | SN | MANUFACTURER | PRODUCT |
1A2B3C4D | CHEVROLET | CORVETTE | 1A2B3C4D | CHEVY | CORVETTE |
ABCD1234 | DODGE | VIPER | ABCD1234 | DODGE | CARAVAN |
Hope this helps to clarify. Please let me know if you have any questions or suggestions. I appreciate your help!
Ugh....sorry. I modified data in the examples as I was typing my last response, and didn't update each "table" as needed. Here are correct values. Sorry for the confusion! I didn't see an option to edit or delete my last response.
Sourcetype=autos
VIN | MAKE | MODEL |
1234ABCD | FORD | GT |
ABCD1234 | DODGE | VIPER |
1A2B3C4D | CHEVROLET | CORVETTE |
A1B2C3D4 | AUDI |
Sourcetype=cars
SN | MANUFACTURER | PRODUCT |
1234ABCD | FORD | GT |
ABCD1234 | DODGE | CARAVAN |
1A2B3C4D | CHEVY | CORVETTE |
A1B2C3D4 | A8 |
I'd like to compare the two sourcetypes and see the results where VIN=SN, but MAKE!=MANUFACTURER OR MODEL!=PRODUCT. (Caveat - if any events in either sourcetype contain a null value, they can be ignored/excluded by the search.)
From the example data above, ideally the search would display the following fields, and results would contain these two events (because VIN and SN match, but "VIPER" does not equal "CARAVAN", and "CHEVROLET" does not equal "CHEVY").
VIN | MAKE | MODEL | SN | MANUFACTURER | PRODUCT |
ABCD1234 | DODGE | VIPER | ABCD1234 | DODGE | CARAVAN |
1A2B3C4D | CHEVROLET | CORVETTE | 1A2B3C4D | CHEVY | CORVETTE |
Sorry again for the confusion.
It's pretty straightforward to do that
| makeresults format=csv data="VIN,MAKE,MODEL
1234ABCD,FORD,GT
ABCD1234,DODGE,VIPER
1A2B3C4D,CHEVROLET,CORVETTE
A1B2C3D4,AUDI,"
| eval sourcetype="autos"
| append [
| makeresults format=csv data="SN,MANUFACTURER,PRODUCT
1234ABCD,FORD,GT
ABCD1234,DODGE,CARAVAN
1A2B3C4D,CHEVY,CORVETTE
A1B2C3D4, ,A8"
| eval sourcetype="cars"
]
``` Above is sample data setup, but imagine your data above has come from
index=your_index sourcetype=autos OR sourcetype=cars
```
``` Now use VIN as the common field - there are actually many ways to do
the same thing, but what you are doing here is to make the dc_XXX fields
ones to be counted for uniqueness.
```
| eval VIN=coalesce(VIN, SN), dc_makes=coalesce(MAKE, MANUFACTURER), dc_models=coalesce(MODEL, PRODUCT)
``` Here there stats values collects all the original data - you may want
to add a | fields statement here to limit to the fields you want
It also counts the unique values of the dc_* fields which is the make
and model from whichever sourcetype ```
| stats values(*) as * dc(dc_*) as dc_* by VIN
``` And now this will find your mismatch items ```
| where dc_makes>1 OR dc_models>1
| fields - sourcetype dc_*
Hope this helps
That works great! Just what I was looking for.
Thanks much for your support, bowesmana!
Thanks for your response, bowesmana! You've got me headed in the right direction.