How to compare same fieldname in two sources, but ...

bcusick · ‎09-04-2014

Hi,

I'm trying to compare one field "primaryKey" in two sources; "sourceA" and "sourceB". There are other fields for each event, and I need to return results where those other field values don't match for the same primaryKey value in each source...

So if I had:

sourceA and sourceB each contain 1 event with "primaryKey=1"

sourceA's fields are: primaryKey=1, fieldA=apple...and..

sourceB's fields are: primaryKey=1, fieldA=Banana

How do I show that fieldA does or does not match for the same "event" (primaryKey=1) in both sources? They both share sourcetype=csv. Would I do this with a transaction on the primaryKey? if so, how?

I'm eventually going to have to compare 3 sources (sourceA, sourceB, and sourceC) for up to 21 fields in each source-using primaryKey. Figured I should start with figuring out just one. 🙂

Thanks,

Brian

aweitzman · ‎09-05-2014

Another way to do this, which would get you the contending values, would be to combine the sources, turn the field values into multivalued fields, and then filter on their size:

index=main (source=a OR source=b) | stats values(fieldA) as AValues, values(fieldB) as BValues, values(fieldC) as CValues by primaryKey | where (mvcount(AValues) > 1 OR mvcount(BValues) > 1 OR mvcount(CValues) > 1)

The values function on stats gets you the distinct values of a field for each primaryKey, so if the field value from source a and the one from source b are the same, they will collapse into one, and if they're different, they'll both show up. Once all that is done for all fields, the where test sees if any of them have more than one value, which would mean that the values weren't the same in the two sources.

musskopf · ‎09-04-2014

Hello,

there are couple of way to get things done... depending how you would like to see the data at the end. If you just want to keep fields that don't match you could use "append" command and "dedup" things at the end, something like that:

index=main source=a | append [search index=main source=b] | dedup pk, fieldA

The example above you make an union of both sources and after that, keep only one occurrence for every pk, fieldA combination. The dedup command has some option, to determine which event is to be kept.

Another way you might want to simply show if the event repeats or not, like:

index=main source=a | append [search index=main source=b] | eventstats count AS occurrences BY pk, fieldA

This last example you add a field named "occurrences" to each event and which will tell you the number of time the fields "pk" and "fieldA" repeated. After that you could filter based on that, like "| where occurrences > 1"

Let me know if that gets close to what you need...

Cheers

bcusick · ‎09-05-2014

This is close! How would I show which fields are matches/non-matches? I'm assuming checking for multiple fields would look like:

|eventstats count AS occurences BY pk, fieldA, fieldB, fieldC, etc.....assuming it's possible to use more than 2 fields in the "BY" portion of eventstats

How to compare same fieldname in two sources, but only return results if other fields are different?

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Announcing Modern Navigation: A New Era of Splunk User Experience

Modernize your Splunk Apps – Introducing Python 3.13 in Splunk

Step into “Hunt the Insider: An Splunk ES Premier Mystery” to catch a cybercriminal ...

Join the Conversation