Splunk Search

Search lookup containing IP addresses by CIDR mask

joepjisc
Path Finder

I cannot find this question being asked this way round, so hopefully its not a duplicate.

I have a lookup CSV like this:

ip,ip-info,timestamp
1.2.3.4,Text about the IP,2020-04-16T17:20:00
4.3.2.1,Different Text Here,2020-01-01T09:00:00

My log source summaries IPs to CIDR subnet, I have no control over this, extracted field looks like this cidr=1.2.3.0/24.

I need to use the CIDR from the log source to get the entrie(s) within that subnet via a lookup, e.g.

index=myindex
| lookup ip-info ip AS cidr OUTPUT ip-info timestamp AS ip-info-timestamp

I have tried adding match_type = CIDR(ip) to the [ip-info] stanza in $SPLUNK_HOME/etc/apps/search/local/transforms.conf but I think this is for looking up an IP against CIDR masks, rather than the way round I need, as no results are returned.

Is this possible, if so, how do I achieve it?

Thanks

0 Karma
1 Solution

rmmiller
Contributor

What an Interesting problem!
Because Splunk is expecting lookups in the other direction, you need to build this functionality into your SPL, but it sure won't be optimal to do so. It sounds like you're stuck with the data you have, though.

I managed to get a general solution working with the double evils map and join, though if you don't care about _time, you can get away with using map and stats instead.

Note: I found zero benefit to having a lookup definition since you can't really make use of any of its functionality. I worked directly with the inputlookup command for this solution.


The basic approach is as follows, where yoursearch is the SPL that generates your events (without any lookups) and assuming the field name cidr stores your CIDR notation:

The general solution, preserving all fields from yoursearch, including _time:

yoursearch
| join type=left cidr
    [yoursearch | map search="| inputlookup ipinfo.csv| eval cidr = $cidr$  |where cidrmatch($cidr$,ip)| stats values(*) AS * BY cidr| eval zippedfield=mvzip(mvzip(ip,'ip-info'),timestamp) | fields cidr,zippedfield"
]

WARNING: This is a very heavy search!

If you only want info from the lookup file that matches, this reduces to:

yoursearch | map search=" | inputlookup ipinfo.csv | eval cidr = $cidr$ |where cidrmatch($cidr$,ip) | stats values(*) AS * BY cidr"

This is a little cleaner, but you will have only the cidr field and the fields from your lookup file.


Explanation

For each event returned by yoursearch, the map command collects information from your lookup that is a CIDR match for that event's logged field named cidr. This subsearch handles more than 1 match per value of cidr in your lookup file by using stats and mvzip on the ip, ip-info, and timestamp fields. (By the way, Splunk doesn't always handle field names with a dash in them gracefully, which is why I have the single quotes around that field.)

The map subsearch returns only the matched info from your lookup file, and this is then matched up to your original search info with the join command with the left option. This allows you to retain all of the info from your original search, with the lookup data in a new multi-value column named zippedfield. Any post-processing of the lookup data is a little easier this way if you're looking to preserve _time info.


Mocked-up data and searches:

Sample lookup (ipinfo.csv):

ip,ip-info,timestamp
1.2.3.4,Text about the IP,2020-04-16T17:20:00
4.3.2.1,Different Text Here,2020-01-01T09:00:00
4.5.6.7,More Text,2020-04-15T12:45:00
8.5.3.1,This One Time At Band Camp,2019-02-04T03:01:00
1.2.3.5,Another subnet neighbor,2020-04-12T22:02:00
1.2.4.3,Slightly different subnet,2020-02-29T23:59:00
42.42.42.42,The Answer,2012-12-24T20:45:00

Sample SPL of just subsearch (without mvzipped fields):

| makeresults | eval _raw="cidr        someotherfield
1.2.3.0/24        Something_happened
1.2.3.0/16        Something_else_happened
4.2.3.0/24        Something_happened_again
4.3.2.0/24        Something_happened_1
8.6.7.0/23        Something_happened_23
192.168.0.0/24        Something_happened_42"
| multikv forceheader=1
| fields - _raw,linecount
| rename COMMMENT AS "This line and the 9 lines above simulate your search"
| map search="| inputlookup ipinfo.csv | eval cidr = $cidr$ | where cidrmatch($cidr$,ip) | stats values(*) AS * BY cidr"

Sample SPL including join to original search with mvzipped fields:

| makeresults | eval _raw="cidr        someotherfield
1.2.3.0/24        Something_happened
1.2.3.0/16        Something_else_happened
4.2.3.0/24        Something_happened_again
4.3.2.0/24        Something_happened_1
8.6.7.0/23        Something_happened_23
192.168.0.0/24        Something_happened_42"
| multikv forceheader=1
| fields - _raw,linecount
| rename COMMMENT AS "This line and the 9 lines above simulate your search"
| join type=left cidr
    [
| makeresults | eval _raw="cidr        someotherfield
1.2.3.0/24        Something_happened
1.2.3.0/16        Something_else_happened
4.2.3.0/24        Something_happened_again
4.3.2.0/24        Something_happened_1
8.6.7.0/23        Something_happened_23
192.168.0.0/24        Something_happened_42"
| multikv forceheader=1
| fields - _raw,linecount
| rename COMMMENT AS "This line and the 9 lines above simulate your search"
| map search="| inputlookup ipinfo.csv| eval cidr = $cidr$  |where cidrmatch($cidr$,ip)| stats values(*) AS * BY cidr | eval zippedfield=mvzip(mvzip(ip,'ip-info'),timestamp) | fields cidr,zippedfield"
]

You can see from this last sample where having a macro would help.

I'm hoping you're after only the information in your lookup file that matches your search, in which case you don't worry about the join.

None of these searches are elegant, but with your data being in the form it's in, this is the price you are likely to pay.

Hope that helps!
rmmiller

View solution in original post

rmmiller
Contributor

What an Interesting problem!
Because Splunk is expecting lookups in the other direction, you need to build this functionality into your SPL, but it sure won't be optimal to do so. It sounds like you're stuck with the data you have, though.

I managed to get a general solution working with the double evils map and join, though if you don't care about _time, you can get away with using map and stats instead.

Note: I found zero benefit to having a lookup definition since you can't really make use of any of its functionality. I worked directly with the inputlookup command for this solution.


The basic approach is as follows, where yoursearch is the SPL that generates your events (without any lookups) and assuming the field name cidr stores your CIDR notation:

The general solution, preserving all fields from yoursearch, including _time:

yoursearch
| join type=left cidr
    [yoursearch | map search="| inputlookup ipinfo.csv| eval cidr = $cidr$  |where cidrmatch($cidr$,ip)| stats values(*) AS * BY cidr| eval zippedfield=mvzip(mvzip(ip,'ip-info'),timestamp) | fields cidr,zippedfield"
]

WARNING: This is a very heavy search!

If you only want info from the lookup file that matches, this reduces to:

yoursearch | map search=" | inputlookup ipinfo.csv | eval cidr = $cidr$ |where cidrmatch($cidr$,ip) | stats values(*) AS * BY cidr"

This is a little cleaner, but you will have only the cidr field and the fields from your lookup file.


Explanation

For each event returned by yoursearch, the map command collects information from your lookup that is a CIDR match for that event's logged field named cidr. This subsearch handles more than 1 match per value of cidr in your lookup file by using stats and mvzip on the ip, ip-info, and timestamp fields. (By the way, Splunk doesn't always handle field names with a dash in them gracefully, which is why I have the single quotes around that field.)

The map subsearch returns only the matched info from your lookup file, and this is then matched up to your original search info with the join command with the left option. This allows you to retain all of the info from your original search, with the lookup data in a new multi-value column named zippedfield. Any post-processing of the lookup data is a little easier this way if you're looking to preserve _time info.


Mocked-up data and searches:

Sample lookup (ipinfo.csv):

ip,ip-info,timestamp
1.2.3.4,Text about the IP,2020-04-16T17:20:00
4.3.2.1,Different Text Here,2020-01-01T09:00:00
4.5.6.7,More Text,2020-04-15T12:45:00
8.5.3.1,This One Time At Band Camp,2019-02-04T03:01:00
1.2.3.5,Another subnet neighbor,2020-04-12T22:02:00
1.2.4.3,Slightly different subnet,2020-02-29T23:59:00
42.42.42.42,The Answer,2012-12-24T20:45:00

Sample SPL of just subsearch (without mvzipped fields):

| makeresults | eval _raw="cidr        someotherfield
1.2.3.0/24        Something_happened
1.2.3.0/16        Something_else_happened
4.2.3.0/24        Something_happened_again
4.3.2.0/24        Something_happened_1
8.6.7.0/23        Something_happened_23
192.168.0.0/24        Something_happened_42"
| multikv forceheader=1
| fields - _raw,linecount
| rename COMMMENT AS "This line and the 9 lines above simulate your search"
| map search="| inputlookup ipinfo.csv | eval cidr = $cidr$ | where cidrmatch($cidr$,ip) | stats values(*) AS * BY cidr"

Sample SPL including join to original search with mvzipped fields:

| makeresults | eval _raw="cidr        someotherfield
1.2.3.0/24        Something_happened
1.2.3.0/16        Something_else_happened
4.2.3.0/24        Something_happened_again
4.3.2.0/24        Something_happened_1
8.6.7.0/23        Something_happened_23
192.168.0.0/24        Something_happened_42"
| multikv forceheader=1
| fields - _raw,linecount
| rename COMMMENT AS "This line and the 9 lines above simulate your search"
| join type=left cidr
    [
| makeresults | eval _raw="cidr        someotherfield
1.2.3.0/24        Something_happened
1.2.3.0/16        Something_else_happened
4.2.3.0/24        Something_happened_again
4.3.2.0/24        Something_happened_1
8.6.7.0/23        Something_happened_23
192.168.0.0/24        Something_happened_42"
| multikv forceheader=1
| fields - _raw,linecount
| rename COMMMENT AS "This line and the 9 lines above simulate your search"
| map search="| inputlookup ipinfo.csv| eval cidr = $cidr$  |where cidrmatch($cidr$,ip)| stats values(*) AS * BY cidr | eval zippedfield=mvzip(mvzip(ip,'ip-info'),timestamp) | fields cidr,zippedfield"
]

You can see from this last sample where having a macro would help.

I'm hoping you're after only the information in your lookup file that matches your search, in which case you don't worry about the join.

None of these searches are elegant, but with your data being in the form it's in, this is the price you are likely to pay.

Hope that helps!
rmmiller

joepjisc
Path Finder

Thanks for such a detailed answer, it is really helpful!

I've got your example working, as you say it is quite slow, but it will return the needed data, will continue to try and find a better source that just provides the IP.

0 Karma

rmmiller
Contributor

Glad you found it useful!
You could make it slightly faster by adding this to your search through your lookup file:
| fields cidr
That will pull only the cidr field from your existing search.

yoursearch | fields cidr | map search=" | inputlookup ipinfo.csv | eval cidr = $cidr$ |where cidrmatch($cidr$,ip) | stats values(*) AS * BY cidr

0 Karma

starcher
Influencer

your lookup does not have the IPs in CIDR notation. fix that. 1.2.3.4/24 as example.

0 Karma

joepjisc
Path Finder

That is the problem, the lookup cannot be in CIDR notation, it is information about specific IP addresses; the system that is writing the log is using the specific IP Addresses, but, for reasons best known to someone else, it logs in CIDR notation; hence my question being written the way it is. There is also no logic I can find to which CIDR mask the log soure applies, I've seen /16 through to /22 in the data I have for testing.

0 Karma
Get Updates on the Splunk Community!

Monitoring Postgres with OpenTelemetry

Behind every business-critical application, you’ll find databases. These behind-the-scenes stores power ...

Mastering Synthetic Browser Testing: Pro Tips to Keep Your Web App Running Smoothly

To start, if you're new to synthetic monitoring, I recommend exploring this synthetic monitoring overview. In ...

Splunk Edge Processor | Popular Use Cases to Get Started with Edge Processor

Splunk Edge Processor offers more efficient, flexible data transformation – helping you reduce noise, control ...