topic Re: How to efficiently query all indexes for a list of IPs in Splunk Search

How to efficiently query all indexes for a list of IPs

asearson — Wed, 16 Oct 2019 19:34:35 GMT

BACKGROUND: My Disaster Recovery team is compiling a list of all IPs endpoints, and has requested that I query all of my Splunk Events (in all Indexes) for anything resembling an IP. I created the following search, which works under my smaller-Staging Splunk-Enterprise, but fails out when I attempt it in my larger-Production Splunk-Enterprise:

index="*" earliest=-1d@d latest=-0d@d
| rex field=_raw "(?<ip>\b(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\b)"
| stats values(ip)

As a workaround to avoid the timeout, I've split the Production search into multiple searches of each Index.

QUESTIONS:

Is there a more efficient way to get the IPs my DR wants?
If there an efficient way to Join the results of the the multiple Index searches in Prod?

Re: How to efficiently query all indexes for a list of IPs

gcusello — Thu, 17 Oct 2019 07:21:33 GMT

Hi asearson,
I cannot check your regex because you didn't shared an example so i take it as good.
Anyway, for the list all the IPs you should use dedup and table commands:

index="*" earliest=-1d@d latest=-0d@d
| rex "(?<ip>\b(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\b)"
| dedup ip
| sort ip
| table ip

I have only one doubt: you want all the IPs of all indexes, but different sourcetype have usually different log formats, so how do you think to extract IPs with one regex from all sourcetypes?

Maybe you could use a different approach:
for servers, you could use nslookup to extract IPs from the DNS passing hostnames in this way:

index=_internal
| dedup host
| lookup nslookup clienthost AS host OUTPUT clientip
| sort host
| table host clientip

For appliances with standard syslog, you can extract IPs using an appropriate regex because it's always in the same site.
Appliances that haven't standard syslog usually have the IP in the hostname.

Ciao.
Giuseppe

Re: How to efficiently query all indexes for a list of IPs

asearson — Fri, 01 Nov 2019 22:31:14 GMT

Thanks for the reply, but not exactly the answer I'm looking for...

CLARIFICATION OF MY PROBLEM STATEMENT:
I need to capture every IP found in all logs, regardless of Index/host/source/sourcetype. A single weblog from a busy webserver could yield 1000's of IPs for each unique client requesting a popular webpage. I'm not concerned about Hostnames.

CLARIFICATIONS TO YOUR QUESTIONS:
Example is anything between 0.0.0.0 and 255.255.255.255.
Regex taken from www.regular-expressions.info/ip.html and verified with regex101.com

The idea for "rex field=_raw" is taken from this:
https://answers.splunk.com/answers/656616/how-to-extract-ip-address-using-regex.html
It is applying to every RAW event, regardless of sourcetype or log format.

TESTING:
I tested your pipeline "| dedup ip | sort ip | table ip" , and job-inspector shows that it actually takes longer than the single "| stats values(ip)" pipe. They yield the same results, with slightly different sort (string rather than Integer)

Re: How to efficiently query all indexes for a list of IPs

bowesmana — Fri, 01 Nov 2019 23:03:59 GMT

sorting is a bad idea, 'sort' without '0' will truncate at the sort limit (default 10000)

Re: How to efficiently query all indexes for a list of IPs

bowesmana — Fri, 01 Nov 2019 23:12:33 GMT

I'm assuming the regex is fine, as you seem happy with that, so in terms of efficiency, if this is a one-off operation, does efficiency matter?

Your query is searching yesterday. Is the intention that it searches further back than that? Could you just run a backfill operation and let Splunk handle the scheduling?

If you're looking for a general solution, then you could output each production index search to a CSV (outputlookup append=t) and then after running all the searches, just inputlookup the csv and stats count on the data.