Solved: Comparing two data sets from the same timeframe an...

Tisiphone_1 · ‎06-18-2010

I am trying to compare the results of two searches that share a common timeframe and index, with a negation. The common resulting field may occur multiple times in both searches. As an example:

Search 1:

index="stuff" status="LoginSuccessful"

and Search 2 is:

index="stuff" status="LoginFailed"

If username is in an event in search 2 but NOT in an event in search 1, I want to see it. (So, show me all of the users that had a failed login and never had a successful login.)

The only way I have found to do this is to output one set of results to a csv and use lookup to search one. Even if this were two separate indices, I'm not clear on how you would negate the results of an entire join.

I figure there has to be a simpler way to do this...

gkanapathy · ‎06-19-2010

There are several easier ways to do this. An efficient way is:

 index=stuff status=loginsuccessful OR status=loginfailed | stats count(eval(status=="LoginSuccessful")) as c_success, count((eval(status=="LoginFailed"))) as c_failure by user | where c_failure > 0 AND c_success == 0

Similar (but somewhat less amenable to map-reduce treatment than the previous, if you have multiple indexer nodes) is:

 index=stuff status=loginsuccessful OR status=loginfailed | transaction user | where status=="LoginFailed" AND NOT status=="LoginSuccessful"

And the obvious way (which is just a concise version of what you did) is to use a subsearch:

 index=stuff status=loginfailed NOT [ search index=stuff status=loginsuccessful | dedup user | fields user ]

But this is relatively inefficient and is subject to limits on subsearch and format that must be increased in the system if you have a lot of users.

View solution in original post

gkanapathy · ‎06-19-2010

There are several easier ways to do this. An efficient way is:

 index=stuff status=loginsuccessful OR status=loginfailed | stats count(eval(status=="LoginSuccessful")) as c_success, count((eval(status=="LoginFailed"))) as c_failure by user | where c_failure > 0 AND c_success == 0

Similar (but somewhat less amenable to map-reduce treatment than the previous, if you have multiple indexer nodes) is:

 index=stuff status=loginsuccessful OR status=loginfailed | transaction user | where status=="LoginFailed" AND NOT status=="LoginSuccessful"

And the obvious way (which is just a concise version of what you did) is to use a subsearch:

 index=stuff status=loginfailed NOT [ search index=stuff status=loginsuccessful | dedup user | fields user ]

But this is relatively inefficient and is subject to limits on subsearch and format that must be increased in the system if you have a lot of users.

Tisiphone_1 · ‎06-19-2010

Thanks! I had my syntax mixed up on the last. Now it works!

Comparing two data sets from the same timeframe and index?

[Puzzles] Solve, Learn, Repeat: Dynamic formatting from XML events

Enter the Agentic Era with Splunk AI Assistant for SPL 1.4

Stronger Security with Federated Search for S3, GCP SQL & Australian Threat ...

Join the Conversation

Comparing two data sets from the same timeframe and index?

[Puzzles] Solve, Learn, Repeat: Dynamic formatting from XML events

Enter the Agentic Era with Splunk AI Assistant for SPL 1.4

Stronger Security with Federated Search for S3, GCP SQL & Australian Threat ...