Splunk Search

How to compare same fieldname in two sources, but only return results if other fields are different?

bcusick
Communicator

Hi,

I'm trying to compare one field "primaryKey" in two sources; "sourceA" and "sourceB". There are other fields for each event, and I need to return results where those other field values don't match for the same primaryKey value in each source...

So if I had:

sourceA and sourceB each contain 1 event with "primaryKey=1"

sourceA's fields are: primaryKey=1, fieldA=apple...and..

sourceB's fields are: primaryKey=1, fieldA=Banana

How do I show that fieldA does or does not match for the same "event" (primaryKey=1) in both sources? They both share sourcetype=csv. Would I do this with a transaction on the primaryKey? if so, how?

I'm eventually going to have to compare 3 sources (sourceA, sourceB, and sourceC) for up to 21 fields in each source-using primaryKey. Figured I should start with figuring out just one. 🙂

Thanks,

Brian

0 Karma

aweitzman
Motivator

Another way to do this, which would get you the contending values, would be to combine the sources, turn the field values into multivalued fields, and then filter on their size:

index=main (source=a OR source=b)
| stats values(fieldA) as AValues, values(fieldB) as BValues, values(fieldC) as CValues by primaryKey
| where (mvcount(AValues) > 1 OR mvcount(BValues) > 1 OR mvcount(CValues) > 1)

The values function on stats gets you the distinct values of a field for each primaryKey, so if the field value from source a and the one from source b are the same, they will collapse into one, and if they're different, they'll both show up. Once all that is done for all fields, the where test sees if any of them have more than one value, which would mean that the values weren't the same in the two sources.

0 Karma

musskopf
Builder

Hello,

there are couple of way to get things done... depending how you would like to see the data at the end. If you just want to keep fields that don't match you could use "append" command and "dedup" things at the end, something like that:

index=main source=a | append [search index=main source=b] | dedup pk, fieldA

The example above you make an union of both sources and after that, keep only one occurrence for every pk, fieldA combination. The dedup command has some option, to determine which event is to be kept.

Another way you might want to simply show if the event repeats or not, like:

index=main source=a | append [search index=main source=b] | eventstats count AS occurrences BY pk, fieldA

This last example you add a field named "occurrences" to each event and which will tell you the number of time the fields "pk" and "fieldA" repeated. After that you could filter based on that, like "| where occurrences > 1"

Let me know if that gets close to what you need...

Cheers

0 Karma

bcusick
Communicator

This is close! How would I show which fields are matches/non-matches? I'm assuming checking for multiple fields would look like:

|eventstats count AS occurences BY pk, fieldA, fieldB, fieldC, etc.....assuming it's possible to use more than 2 fields in the "BY" portion of eventstats

0 Karma
Get Updates on the Splunk Community!

Stay Connected: Your Guide to November Tech Talks, Office Hours, and Webinars!

🍂 Fall into November with a fresh lineup of Community Office Hours, Tech Talks, and Webinars we’ve ...

Transform your security operations with Splunk Enterprise Security

Hi Splunk Community, Splunk Platform has set a great foundation for your security operations. With the ...

Splunk Admins and App Developers | Earn a $35 gift card!

Splunk, in collaboration with ESG (Enterprise Strategy Group) by TechTarget, is excited to announce a ...