Splunk Search

trying to compare two sources and extract these ones that do not have SysMon

DanAlexander
Communicator

Hello Community,

I have a challenge finding and isolating the unique hosts out of two sources (DHCL and SysMon in my case)

I did try the following but it did work as expected:

EXAMPLE 1:

index=dhcp_source_index | stats count by host | eval source="dhcp" | append [ search index=sysmon_index | stats count by host | eval source="sysmon" ] | stats values(source) as sources by host | where mvcount(sources)=1 AND sources="dhcp"

 

EXAMPLE 2:

index=my_index | dedup host, source | stats list(source) as sources by host | append [search index=my_index | stats latest(_time) as last_seen by host] | eventstats max(last_seen) as last_seen by host | where mvcount(sources)=1 | table host, last_seen

The numbers from the manual findings and the above SPLs differ

Thanks in advance

0 Karma
1 Solution

yuanliu
SplunkTrust
SplunkTrust

| set diff
[| tstats count where source_1 by host | table host]
[| tstats count where source_2 by host | table host]

That SPL provides a list of all of the hosts not seen in source_2


The search is not wrong but the last statement is inaccurate because set diff as shown produces a list of all hosts in source_1 not seen in source_2, plus all hosts in source_2 not seen in source_1. (The statement is correct only if hosts in source_2 is a subset of that in source_1.  Maybe this is a condition known in your use case?)  So, it is equivalent to the search I posted in this one.

To get list of only those hosts in source_1 that are not in source_2, use my search in this earlier one or, as @PickleRick suggested, improve it with tstats like

 

| tstats values(host) as host
  where source_1 NOT
  [tstats values(host) as host
   where source_2]

 

If hosts in source_2 is a subset of that in source_1 as may be the case, this method will produce the exact same result, and will still be more efficient.

View solution in original post

yuanliu
SplunkTrust
SplunkTrust

Two questions.

  1. In addition to comparison of host values for the purpose of exclusion, is there any other information you want to extract from sysmon_index?  I will assume no.
  2. Is presence in sysmon_index necessary and sufficient for exclusion?  I will assume yes.

With these assumptions, you don't need stats.  The following subsearch should suffice:

index=dhcp_source_index NOT
  [search index=sysmon_index
  | stats values(host) as host]
0 Karma

PickleRick
SplunkTrust
SplunkTrust

It's not clear how you distinguish your "sources". Your first search simply pulls data from two separate indexes while the second one does something completely strange.

Please describe what constitutes those sets of sources you want to calculate difference from.

0 Karma

DanAlexander
Communicator

@PickleRick Thanks for the reply.

Please, ignore both searches.

What I want to pull out the total unique assets in the DHCP source. I then want to be able to compare to the totals of unique assets in the SysMon source and output these assets that do not have SysMon present.

Thanks in advance.

0 Karma

PickleRick
SplunkTrust
SplunkTrust

Sure. But how do you define "assets"? How do you differentiate between them?

Because while you can use the general approach of combining two separate searches (either by means of append or multisearch) with additional field to classify your results into one of two sets, there might be more effective ways in specific cases.

0 Karma

DanAlexander
Communicator

Thanks for the reply @PickleRick 

Sure, there will be a third column containing only assets that are not seen in both sources simultaneously and  in addition at the end of the list there should be Totals of these assets.

Would you be able to develop a sample solution for this, please?

Thank you.

Tags (1)
0 Karma

PickleRick
SplunkTrust
SplunkTrust

But what do you mean by asset? What in your data tells you that this is one "asset" and this is another one? Is it the host field or some other field within your data? Or any combination of fields?

0 Karma

DanAlexander
Communicator

Hi, thanks for the reply.

To simplify it let us say we have two lists of items of same type it could be anything.

How can we compare both lists and list only the subset of items not common to both lists.

Regards,

D

0 Karma

yuanliu
SplunkTrust
SplunkTrust

You can see my earlier comment.  Essentially you don't need append/stats for this job.

index=dhcp_source_index NOT
  [search index=sysmon_index
  | stats values(host) as host]

If you only want to know which DHCP hosts  are not in sysmon_index, add stats values(host) or stats count by host  after this search.

Tags (1)
0 Karma

DanAlexander
Communicator

Thanks for your reply @yuanliu 

Unfortunately, your search did not provide the results I wanted. After executing the separate searches and abstract manually the result differs from the resultant of your search. Please do try it out.

After lots of try/error I finally found the one that does the trick. It is by using 'set diff' command. I will provide my solution tomorrow for everyone to use.

Regards,

Dan

0 Karma

yuanliu
SplunkTrust
SplunkTrust

set diff does not give host that do not have SysMon as the original question specifies.  So, you want to know which sets of hosts are unique to each search, and not care that the only come from dhcp_source_index? (That is why I was asking very specific clarification questions, and stated clear assumptions of what my search is intended to do.)

Again, set is an expensive operation.  You should be able to use stats to achieve it.  The following is equivalent to set diff:

index=dhcp_source_index OR index=sysmon_index
| stats values(index) as index by host
| where mvcount(index) == 1

Maybe you have some requirements that you are not telling us? 

0 Karma

DanAlexander
Communicator

Thanks for the reply @yuanliu 

Agree to disagree.

If you look at the very beginning of my post I asked:

"I have a challenge finding and isolating the unique hosts out of two sources"

I think this is clear and SysMon and DHCP were just an example. Nothing concrete.

During the communication I have reiterated this statement.

Apologies if misunderstood.

Thanks all for your help.

0 Karma

PickleRick
SplunkTrust
SplunkTrust

Honestly, I have no idea what was the point of this thread. You asked some vague question (apparently wih some assumptions known only to you). People tried to help you by asking more precise questions (because often there is more than one way to do something in Splunk and depending on the data and requirements some solutions may be way way more effective than others) and you kept throwing more and more remarks leading nowhere.

Even if you simply asked it as a purely theoretical exercise - to find out what approaches you can take to solve this kind of problem, you should have said so clearly - the descriptive answer about general possible approaches is something completely different than a solution to a specific problem.

And no, the set operators are not very effective and are very rarely used since there are usually much better solutions - either filtering out by a subsearch as @yuanliu showed or by classifying, statsing and filtering (or even faster - tstatsing if you can use indexed fields).

0 Karma

DanAlexander
Communicator

Frustration is the least emotional state I wanted to achieve here. Apologies!

I still believe it was not me confusing people, I just wanted help with simply being able to compare two datasets and printing out the only hosts seen in one of the data sources. if that sounds confusing then again apologies.

I have been a member for some time now and always admired all the help given here.

I have tested all solutions provided so far, and seen no results (it might be my fault).

The only solution that provided me with the results I want was the following for all to use.

I agree (do not have great knowledge around set command as of now) set might not be as efficient as other commands bet here is what worked for me.

| set diff
[| tstats count where source_1 by host | table host]
[| tstats count where source_2 by host | table host]
That SPL provides a list of all of the hosts not seen in source_2

At the end of the day it is important for people to get some working examples. They are testing and either working or not working. Silence is gold time to time and no one wants frustration waves. I was not meant to do any harm. Been in the industry for a long time enough to realize we all different and have different emotions.

Please, by all means if you can create something that would prove me wrong or anything better than set command for the community to use.

Thank you!

0 Karma

yuanliu
SplunkTrust
SplunkTrust

| set diff
[| tstats count where source_1 by host | table host]
[| tstats count where source_2 by host | table host]

That SPL provides a list of all of the hosts not seen in source_2


The search is not wrong but the last statement is inaccurate because set diff as shown produces a list of all hosts in source_1 not seen in source_2, plus all hosts in source_2 not seen in source_1. (The statement is correct only if hosts in source_2 is a subset of that in source_1.  Maybe this is a condition known in your use case?)  So, it is equivalent to the search I posted in this one.

To get list of only those hosts in source_1 that are not in source_2, use my search in this earlier one or, as @PickleRick suggested, improve it with tstats like

 

| tstats values(host) as host
  where source_1 NOT
  [tstats values(host) as host
   where source_2]

 

If hosts in source_2 is a subset of that in source_1 as may be the case, this method will produce the exact same result, and will still be more efficient.

DanAlexander
Communicator

Thanks very much for the solution @yuanliu

Much appreciated!

0 Karma
Get Updates on the Splunk Community!

Developer Spotlight with Paul Stout

Welcome to our very first developer spotlight release series where we'll feature some awesome Splunk ...

State of Splunk Careers 2024: Maximizing Career Outcomes and the Continued Value of ...

For the past four years, Splunk has partnered with Enterprise Strategy Group to conduct a survey that gauges ...

Data-Driven Success: Splunk & Financial Services

Splunk streamlines the process of extracting insights from large volumes of data. In this fast-paced world, ...