Splunk Search

Compare field from 2 sources and return when source1 has no match in source2

axinjakson
Explorer

I am attempting to take IPs from 2 different sources and output a list for when Source1 has a unique IP that is not present in Source2. Source2 can contain as many unique IPs as it wants, the main goal is just to extract unique IPs from Source1.

Source1

  • 192.168.0.1
  • 192.168.0.2
  • 192.168.0.3*

Source2

  • 192.168.0.1
  • 192.168.0.2
  • 192.168.0.4
  • 192.168.0.5

Output

  • 192.168.0.3*

Using online docs and previous posts I have attempted with the query below in several variations. It works well to give me all matching, but if I attempt <, without stats, where 1<2, or any other variation I can think of, it fails. Help?

sourcetype="SOURCE1" | dedup src_ip | fields src_ip | eval ip=src_ip | eval type=1 | append [ search sourcetype="SOURCE2" | dedup src_addr | fields src_addr | eval ip=src_addr | eval type=2 ] | stats dc(type) as dctype by ip | where dctype>1 | table ip

1 Solution

lguinn2
Legend

I am not sure if this will work, but I tested as best I could with the data that I had at hand:

sourcetype=SOURCE1 earliest=-1h | dedup src_ip |  eval ip=src_ip | eval source1="Y" | table ip source1 | 
join ip type=outer [search sourcetype=SOURCE2 earliest=-1h | dedup src_addr | eval ip=src_addr |
eval source2 = "Y" | table ip source2 ]

After you run this command, you should have 3 columns of output:

ip source1 source2

The ip column will be the ip address, of course. The source1 column will contain Y if the ip address existed in source1, and the source2 column will contain Y if the ip address existed in source2. To go further, you could append the following to the search to get the variations that you want

To only show IPs that appear in both, add:

| where source1 = "Y" AND source2 = "Y"

To only show IPs that appear ONLY in source1, add:

| where source1 = "Y" AND source2 != "Y"

You get the idea. There might be a more efficient way to do this, but this gives you a "base search" that is pretty flexible. You could even put the "base search" in a macro, which would make it even easier to type.

Note the use of earliest=-1h in the two searches; this will keep your working data size under reasonable control, and help keep the search reasonably quick.

View solution in original post

kristian_kolb
Ultra Champion

Hi, have you tried a simple subsearch? This should be more efficient than a join.

sourcetype=SOURCE_1 NOT [search sourcetype=SOURCE_2 | dedup src_ip | fields + src_ip] | dedup src_ip | stats values(src_ip)

This subsearch 'expands' to src_ip=xxx OR src_ip=yyyy OR src_ip=zzzzz, and combining this with NOT should get you the results you desire. For more information, please see the docs.

http://docs.splunk.com/Documentation/Splunk/4.2.2/User/HowSubsearchesWork#Use_subsearch_to_correlate...

Hope this helps,

Kristian

cabauah
Path Finder

subsearch has a limit. can we use simple stats to find what's missing in two sourcetypes based on a common field

0 Karma

lguinn2
Legend

I am not sure if this will work, but I tested as best I could with the data that I had at hand:

sourcetype=SOURCE1 earliest=-1h | dedup src_ip |  eval ip=src_ip | eval source1="Y" | table ip source1 | 
join ip type=outer [search sourcetype=SOURCE2 earliest=-1h | dedup src_addr | eval ip=src_addr |
eval source2 = "Y" | table ip source2 ]

After you run this command, you should have 3 columns of output:

ip source1 source2

The ip column will be the ip address, of course. The source1 column will contain Y if the ip address existed in source1, and the source2 column will contain Y if the ip address existed in source2. To go further, you could append the following to the search to get the variations that you want

To only show IPs that appear in both, add:

| where source1 = "Y" AND source2 = "Y"

To only show IPs that appear ONLY in source1, add:

| where source1 = "Y" AND source2 != "Y"

You get the idea. There might be a more efficient way to do this, but this gives you a "base search" that is pretty flexible. You could even put the "base search" in a macro, which would make it even easier to type.

Note the use of earliest=-1h in the two searches; this will keep your working data size under reasonable control, and help keep the search reasonably quick.

regriffith
Path Finder

The statement | where source1 = "Y" AND source2 != "Y" doesn't work. You can use | where source1 = "Y" AND NOT source2= "Y".

0 Karma

lguinn2
Legend

Instead of running "Y" on both, you could just eliminate the type=outer. An outer join says "create a joined result even when one of the two events doesn't exist." A normal join, AKA an inner join, says "only create a joined if both events exist" -- this should only give you the 50 that are present in both, and it is more efficient. Otherwise, you are creating the full result set and then eliminating some of them...

And DOH! I often forget that the syntax for the where command and the syntax for search are NOT the same! Good catch.

0 Karma

axinjakson
Explorer

Works 99% Thanks! The output tables will add nicely to my reporting.

Have 1 issue though... The initial output gives 85 results total, 50 are present in both, 35 are not present in Source2. I can run "where = Y" on both and get the correct return of 50 total. However when I try the != I get no results at all?? Same as your last where string of Source1=Y and Source2!=Y, formatting is not correct in my comment of course.

EDIT: | where source1 = "Y" AND NOT source2 = "Y" works 🙂

0 Karma
Get Updates on the Splunk Community!

Enterprise Security Content Update (ESCU) | New Releases

In December, the Splunk Threat Research Team had 1 release of new security content via the Enterprise Security ...

Why am I not seeing the finding in Splunk Enterprise Security Analyst Queue?

(This is the first of a series of 2 blogs). Splunk Enterprise Security is a fantastic tool that offers robust ...

Index This | What are the 12 Days of Splunk-mas?

December 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...