Getting Data In

Differences between two sourcetype according to a field

lufermalgo
Path Finder

I need to know what events are on the sourcetype A that are not in the sourcetype B.

the query must evaluate more than 1 million events

consecutive   consecutive
000234456     000234456
000977832     000977832
000383748     000183846
000773732     000773732
000859484     000339274
000272849     000646473
000263664     000263664
000183846
000339274
000646473
sourcetypeA   sourcetypeB
Tags (3)
0 Karma
1 Solution

martin_mueller
SplunkTrust
SplunkTrust

Replacing the count with a dc(sourcetype) would lift that restriction... however, the question implies that events in B but not in A aren't interesting, which would not be distinguishable with count or dc(sourcetype) alone. Here's how to solve that:

sourcetype=A OR sourcetype=B | stats values(sourcetype) as sourcetypes by consecutive | search sourcetypes=A NOT sourcetypes=B

Note, this will work for millions of events but might break, slow down, or at least consume a lot of memory for millions of distinct values of consecutive... stats basically has to keep millions of buckets around in that case.
Is there any relationship between the timestamps of A and B? If so, using transaction with short spans might use massively less memory:

sourcetype=A OR sourcetype=B | transaction consecutive maxspan=1m keepevicted=t | search sourcetype=A NOT sourcetypes=B

In theory there's also this:

sourcetype=A NOT [search sourcetype=B | dedup consecutive | fields consecutive]

But that again might (will) break for millions of distinct consecutive values.

View solution in original post

lufermalgo
Path Finder

Thanks @martin_mueller and @gpradeepkumarreddy for their contributions.

Take one of the recommendations of @martin_mueller and Realize the following query:

index=ilt sourcetype=boleta_generada OR sourcetype=boleta_indexado TIPO_DOCUMENTO="BE PEDIDO" CAMPANA>="201411" | dedup CONSECUTIVO WF_ESTADO | eventstats values(sourcetype) as sourcetypes by CONSECUTIVO | search sourcetypes=boleta_generada NOT sourcetypes=boleta_indexado | stats count

Effectively this give me the expected result but equally was very slow to deliver the answer.

Martin_mueller question about whether I could create a transaction with the transaction command, could not, as the events in the sourcetype=boleta_indexado can crearce after many days.

I want to share with you that I did another query graph and I'm getting basically what I would like to integrate other bar to show me the difference:

index=ilt sourcetype=boleta_generada OR sourcetype=boleta_indexado TIPO_DOCUMENTO="BE PEDIDO" (CAMPANA>="201408") | dedup CONSECUTIVO WF_ESTADO | chart count over CAMPANA by WF_ESTADO | sort -WF_ESTADO

chart:
https://drive.google.com/file/d/0B1HsOnBT01xZWDcyY2dTNUUwZFE/edit?usp=sharing

The WF_ESTADO field contains the values​​:

GENERATED
INDEXED

I would like the other bar that was called MISSING paint.

0 Karma

martin_mueller
SplunkTrust
SplunkTrust

Replacing the count with a dc(sourcetype) would lift that restriction... however, the question implies that events in B but not in A aren't interesting, which would not be distinguishable with count or dc(sourcetype) alone. Here's how to solve that:

sourcetype=A OR sourcetype=B | stats values(sourcetype) as sourcetypes by consecutive | search sourcetypes=A NOT sourcetypes=B

Note, this will work for millions of events but might break, slow down, or at least consume a lot of memory for millions of distinct values of consecutive... stats basically has to keep millions of buckets around in that case.
Is there any relationship between the timestamps of A and B? If so, using transaction with short spans might use massively less memory:

sourcetype=A OR sourcetype=B | transaction consecutive maxspan=1m keepevicted=t | search sourcetype=A NOT sourcetypes=B

In theory there's also this:

sourcetype=A NOT [search sourcetype=B | dedup consecutive | fields consecutive]

But that again might (will) break for millions of distinct consecutive values.

pradeepkumarg
Influencer

if the value exists in both the sourcetypes do they exist more than once in each sourcetype? If not, below query might help you

sourcetype=sourcetypeA OR sourcetype=sourcetypeB | stats count by consecutive | search count < 2

0 Karma
Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

May 2026 Splunk Expert Sessions: Security & Observability

Level Up Your Operations: May 2026 Splunk Expert Sessions Whether you are refining your security posture or ...

Network to App: Observability Unlocked [May & June Series]

In today’s digital landscape, your environment is no longer confined to the data center. It spans complex ...

SPL2 Deep Dives, AppDynamics Integrations, SAML Made Simple and Much More on Splunk ...

Splunk Lantern is Splunk’s customer success center that provides practical guidance from Splunk experts on key ...