I have a situation where i need to verify that the data held in two apps is the same.
To perform this verification i am logging out the data for each account into two log files: appOneLog and appTwoLog.
These logs contain the field accountId and some other data relating to the account.
I'm trying to find all the accounIds that are present in appOne and not appTwo and vice versa.
I have made various attempts at this which, i think, should work according to the splunk documentation.
First of all, I tried to use the transaction command to pair up events from appOneLog and appTwoLog into transactions and match against any transaction that had less than 2 events i.e.:
but when checking the results of this query I found it to contain some events for accounts that were actually present in both sets of logs.
I then tried to take a different approach by labelling each result from the appOneLog source with a present in appOne flag, creating a table and then performing a join on a search over the appTwoLog source that labelled each event with a present in appTwo flag. After that I would match on any event that did not have both flags set i.e.:
however, yet again, this returns some false positive results where the account actually is in both but is getting labelled as just being in appOne.
Can you think of any reason why these false positive results are being returned?
Or any alternative way of retrieving this information?
An important thing to note here is that the number of events i am searching over is very large i.e. it can get up to just over 2,000,000 events.
However, i have tried the above queries on a smaller subset of the results and still get the same problems.