I'm trying to compare one field "primaryKey" in two sources; "sourceA" and "sourceB". There are other fields for each event, and I need to return results where those other field values don't match for the same primaryKey value in each source...
So if I had:
sourceA and sourceB each contain 1 event with "primaryKey=1"
sourceA's fields are: primaryKey=1, fieldA=apple...and..
sourceB's fields are: primaryKey=1, fieldA=Banana
How do I show that fieldA does or does not match for the same "event" (primaryKey=1) in both sources? They both share sourcetype=csv. Would I do this with a transaction on the primaryKey? if so, how?
I'm eventually going to have to compare 3 sources (sourceA, sourceB, and sourceC) for up to 21 fields in each source-using primaryKey. Figured I should start with figuring out just one. 🙂
Another way to do this, which would get you the contending values, would be to combine the sources, turn the field values into multivalued fields, and then filter on their size:
index=main (source=a OR source=b)
| stats values(fieldA) as AValues, values(fieldB) as BValues, values(fieldC) as CValues by primaryKey
| where (mvcount(AValues) > 1 OR mvcount(BValues) > 1 OR mvcount(CValues) > 1)
values function on
stats gets you the distinct values of a field for each
primaryKey, so if the field value from source
a and the one from source
b are the same, they will collapse into one, and if they're different, they'll both show up. Once all that is done for all fields, the
where test sees if any of them have more than one value, which would mean that the values weren't the same in the two sources.
there are couple of way to get things done... depending how you would like to see the data at the end. If you just want to keep fields that don't match you could use "append" command and "dedup" things at the end, something like that:
index=main source=a | append [search index=main source=b] | dedup pk, fieldA
The example above you make an union of both sources and after that, keep only one occurrence for every pk, fieldA combination. The dedup command has some option, to determine which event is to be kept.
Another way you might want to simply show if the event repeats or not, like:
index=main source=a | append [search index=main source=b] | eventstats count AS occurrences BY pk, fieldA
This last example you add a field named "occurrences" to each event and which will tell you the number of time the fields "pk" and "fieldA" repeated. After that you could filter based on that, like "| where occurrences > 1"
Let me know if that gets close to what you need...
This is close! How would I show which fields are matches/non-matches? I'm assuming checking for multiple fields would look like:
|eventstats count AS occurences BY pk, fieldA, fieldB, fieldC, etc.....assuming it's possible to use more than 2 fields in the "BY" portion of eventstats