Getting Data In

How to verify the data held in two apps is the same?

clack
Engager

Hi,

I have a situation where i need to verify that the data held in two apps is the same.
To perform this verification i am logging out the data for each account into two log files: appOneLog and appTwoLog.
These logs contain the field accountId and some other data relating to the account.

I'm trying to find all the accounIds that are present in appOne and not appTwo and vice versa.
I have made various attempts at this which, i think, should work according to the splunk documentation.

First of all, I tried to use the transaction command to pair up events from appOneLog and appTwoLog into transactions and match against any transaction that had less than 2 events i.e.:

sourcetype="app*" | sort 0 str(accountId) | transaction accountId maxspan=10m maxpause=10m maxevents=2 | search eventcount<2

but when checking the results of this query I found it to contain some events for accounts that were actually present in both sets of logs.

I then tried to take a different approach by labelling each result from the appOneLog source with a present in appOne flag, creating a table and then performing a join on a search over the appTwoLog source that labelled each event with a present in appTwo flag. After that I would match on any event that did not have both flags set i.e.:

sourcetype="appOneLog" | eval presentInAppOne="Y" | table accountId presentInAppOne | join accountId type=outer [search sourcetype="appTwoLog" | eval presentInAppTwo= "Y" | table accountId presentInAppTwo ] | where (NOT (presentInAppOne="Y" AND presentInAppTwo="Y"))

however, yet again, this returns some false positive results where the account actually is in both but is getting labelled as just being in appOne.

Can you think of any reason why these false positive results are being returned?
Or any alternative way of retrieving this information?

An important thing to note here is that the number of events i am searching over is very large i.e. it can get up to just over 2,000,000 events.
However, i have tried the above queries on a smaller subset of the results and still get the same problems.

Thanks in advance.

1 Solution

somesoni2
Revered Legend

Try these

Get list of accountId which are NOT present in both the sourcetypes (present in either)

sourcetype="appOneLog" OR sourcetype="appTwoLog" | stats values(sourcetype) as st by accountId | where mvcount(st)=1

Get list of accountId present in appOneLog but not in appTwoLog

sourcetype="appOneLog" OR sourcetype="appTwoLog" | stats values(sourcetype) as st by accountId | where mvcount(st)=1 and mvindex(st,0)="appOneLog"

Get list of accountId present in appTwoLog but not in appOneLog

sourcetype="appOneLog" OR sourcetype="appTwoLog" | stats values(sourcetype) as st by accountId | where mvcount(st)=1 and mvindex(st,0)="appTwoLog"

View solution in original post

somesoni2
Revered Legend

Try these

Get list of accountId which are NOT present in both the sourcetypes (present in either)

sourcetype="appOneLog" OR sourcetype="appTwoLog" | stats values(sourcetype) as st by accountId | where mvcount(st)=1

Get list of accountId present in appOneLog but not in appTwoLog

sourcetype="appOneLog" OR sourcetype="appTwoLog" | stats values(sourcetype) as st by accountId | where mvcount(st)=1 and mvindex(st,0)="appOneLog"

Get list of accountId present in appTwoLog but not in appOneLog

sourcetype="appOneLog" OR sourcetype="appTwoLog" | stats values(sourcetype) as st by accountId | where mvcount(st)=1 and mvindex(st,0)="appTwoLog"

clack
Engager

Thanks, that is exactly what i was looking for and is also a lot more efficient then my previous attempts. I think they may have been hitting some memory constraints.

0 Karma
Get Updates on the Splunk Community!

CX Day is Coming!

Customer Experience (CX) Day is on October 7th!! We're so excited to bring back another day full of wonderful ...

Strengthen Your Future: A Look Back at Splunk 10 Innovations and .conf25 Highlights!

The Big One: Splunk 10 is Here!  The moment many of you have been waiting for has arrived! We are thrilled to ...

Now Offering the AI Assistant Usage Dashboard in Cloud Monitoring Console

Today, we’re excited to announce the release of a brand new AI assistant usage dashboard in Cloud Monitoring ...