Getting Data In

How to verify the data held in two apps is the same?

clack
Engager

Hi,

I have a situation where i need to verify that the data held in two apps is the same.
To perform this verification i am logging out the data for each account into two log files: appOneLog and appTwoLog.
These logs contain the field accountId and some other data relating to the account.

I'm trying to find all the accounIds that are present in appOne and not appTwo and vice versa.
I have made various attempts at this which, i think, should work according to the splunk documentation.

First of all, I tried to use the transaction command to pair up events from appOneLog and appTwoLog into transactions and match against any transaction that had less than 2 events i.e.:

sourcetype="app*" | sort 0 str(accountId) | transaction accountId maxspan=10m maxpause=10m maxevents=2 | search eventcount<2

but when checking the results of this query I found it to contain some events for accounts that were actually present in both sets of logs.

I then tried to take a different approach by labelling each result from the appOneLog source with a present in appOne flag, creating a table and then performing a join on a search over the appTwoLog source that labelled each event with a present in appTwo flag. After that I would match on any event that did not have both flags set i.e.:

sourcetype="appOneLog" | eval presentInAppOne="Y" | table accountId presentInAppOne | join accountId type=outer [search sourcetype="appTwoLog" | eval presentInAppTwo= "Y" | table accountId presentInAppTwo ] | where (NOT (presentInAppOne="Y" AND presentInAppTwo="Y"))

however, yet again, this returns some false positive results where the account actually is in both but is getting labelled as just being in appOne.

Can you think of any reason why these false positive results are being returned?
Or any alternative way of retrieving this information?

An important thing to note here is that the number of events i am searching over is very large i.e. it can get up to just over 2,000,000 events.
However, i have tried the above queries on a smaller subset of the results and still get the same problems.

Thanks in advance.

1 Solution

somesoni2
Revered Legend

Try these

Get list of accountId which are NOT present in both the sourcetypes (present in either)

sourcetype="appOneLog" OR sourcetype="appTwoLog" | stats values(sourcetype) as st by accountId | where mvcount(st)=1

Get list of accountId present in appOneLog but not in appTwoLog

sourcetype="appOneLog" OR sourcetype="appTwoLog" | stats values(sourcetype) as st by accountId | where mvcount(st)=1 and mvindex(st,0)="appOneLog"

Get list of accountId present in appTwoLog but not in appOneLog

sourcetype="appOneLog" OR sourcetype="appTwoLog" | stats values(sourcetype) as st by accountId | where mvcount(st)=1 and mvindex(st,0)="appTwoLog"

View solution in original post

somesoni2
Revered Legend

Try these

Get list of accountId which are NOT present in both the sourcetypes (present in either)

sourcetype="appOneLog" OR sourcetype="appTwoLog" | stats values(sourcetype) as st by accountId | where mvcount(st)=1

Get list of accountId present in appOneLog but not in appTwoLog

sourcetype="appOneLog" OR sourcetype="appTwoLog" | stats values(sourcetype) as st by accountId | where mvcount(st)=1 and mvindex(st,0)="appOneLog"

Get list of accountId present in appTwoLog but not in appOneLog

sourcetype="appOneLog" OR sourcetype="appTwoLog" | stats values(sourcetype) as st by accountId | where mvcount(st)=1 and mvindex(st,0)="appTwoLog"

View solution in original post

clack
Engager

Thanks, that is exactly what i was looking for and is also a lot more efficient then my previous attempts. I think they may have been hitting some memory constraints.

0 Karma
Take the 2021 Splunk Career Survey

Help us learn about how Splunk has
impacted your career by taking the 2021 Splunk Career Survey.

Earn $50 in Amazon cash!