<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: How to efficiently query all indexes for a list of IPs in Splunk Search</title>
    <link>https://community.splunk.com/t5/Splunk-Search/How-to-efficiently-query-all-indexes-for-a-list-of-IPs/m-p/461993#M130246</link>
    <description>&lt;P&gt;I'm assuming the regex is fine, as you seem happy with that, so in terms of efficiency, if this is a one-off operation, does efficiency matter?&lt;/P&gt;

&lt;P&gt;Your query is searching yesterday. Is the intention that it searches further back than that? Could you just run a backfill operation and let Splunk handle the scheduling?&lt;/P&gt;

&lt;P&gt;If you're looking for a general solution, then you could output each production index search to a CSV (outputlookup append=t) and then after running all the searches, just inputlookup the csv and stats count on the data.&lt;/P&gt;</description>
    <pubDate>Fri, 01 Nov 2019 23:12:33 GMT</pubDate>
    <dc:creator>bowesmana</dc:creator>
    <dc:date>2019-11-01T23:12:33Z</dc:date>
    <item>
      <title>How to efficiently query all indexes for a list of IPs</title>
      <link>https://community.splunk.com/t5/Splunk-Search/How-to-efficiently-query-all-indexes-for-a-list-of-IPs/m-p/461989#M130242</link>
      <description>&lt;P&gt;&lt;STRONG&gt;BACKGROUND:&lt;/STRONG&gt; My Disaster Recovery team is compiling a list of all IPs endpoints, and has requested that I query all of my Splunk Events (&lt;EM&gt;in all Indexes&lt;/EM&gt;) for anything resembling an IP. I created the following search, which works under my smaller-Staging Splunk-Enterprise, but fails out when I attempt it in my larger-Production Splunk-Enterprise:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;index="*" earliest=-1d@d latest=-0d@d
| rex field=_raw "(?&amp;lt;ip&amp;gt;\b(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\b)"
| stats values(ip)
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;As a workaround to avoid the timeout, I've split the Production search into multiple searches of each Index.&lt;/P&gt;

&lt;P&gt;&lt;STRONG&gt;QUESTIONS&lt;/STRONG&gt;: &lt;/P&gt;

&lt;OL&gt;
&lt;LI&gt;Is there a more &lt;EM&gt;efficient&lt;/EM&gt; way to get the IPs my DR wants? &lt;/LI&gt;
&lt;LI&gt;If there an &lt;EM&gt;efficient&lt;/EM&gt; way to Join the results of the the multiple Index searches in Prod?&lt;/LI&gt;
&lt;/OL&gt;</description>
      <pubDate>Wed, 16 Oct 2019 19:34:35 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/How-to-efficiently-query-all-indexes-for-a-list-of-IPs/m-p/461989#M130242</guid>
      <dc:creator>asearson</dc:creator>
      <dc:date>2019-10-16T19:34:35Z</dc:date>
    </item>
    <item>
      <title>Re: How to efficiently query all indexes for a list of IPs</title>
      <link>https://community.splunk.com/t5/Splunk-Search/How-to-efficiently-query-all-indexes-for-a-list-of-IPs/m-p/461990#M130243</link>
      <description>&lt;P&gt;Hi asearson,&lt;BR /&gt;
I cannot check your regex because you didn't shared an example so i take it as good.&lt;BR /&gt;
Anyway, for the list all the IPs you should use &lt;STRONG&gt;dedup&lt;/STRONG&gt; and &lt;STRONG&gt;table&lt;/STRONG&gt; commands:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;index="*" earliest=-1d@d latest=-0d@d
| rex "(?&amp;lt;ip&amp;gt;\b(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\b)"
| dedup ip
| sort ip
| table ip
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;I have only one doubt: you want all the IPs of all indexes, but different sourcetype have usually different log formats, so how do you think to extract IPs with one regex from all sourcetypes?&lt;/P&gt;

&lt;P&gt;Maybe you could use a different approach:&lt;BR /&gt;
for servers, you could use &lt;STRONG&gt;nslookup&lt;/STRONG&gt; to extract IPs from the DNS passing hostnames in this way:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;index=_internal
| dedup host
| lookup nslookup clienthost AS host OUTPUT clientip
| sort host
| table host clientip
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;For appliances with standard syslog, you can extract IPs using an appropriate regex because it's always in the same site.&lt;BR /&gt;
Appliances that haven't standard syslog usually have the IP in the hostname.&lt;/P&gt;

&lt;P&gt;Ciao.&lt;BR /&gt;
Giuseppe&lt;/P&gt;</description>
      <pubDate>Thu, 17 Oct 2019 07:21:33 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/How-to-efficiently-query-all-indexes-for-a-list-of-IPs/m-p/461990#M130243</guid>
      <dc:creator>gcusello</dc:creator>
      <dc:date>2019-10-17T07:21:33Z</dc:date>
    </item>
    <item>
      <title>Re: How to efficiently query all indexes for a list of IPs</title>
      <link>https://community.splunk.com/t5/Splunk-Search/How-to-efficiently-query-all-indexes-for-a-list-of-IPs/m-p/461991#M130244</link>
      <description>&lt;P&gt;Thanks for the reply, but not exactly the answer I'm looking for...&lt;/P&gt;

&lt;P&gt;CLARIFICATION OF MY PROBLEM STATEMENT:&lt;BR /&gt;
I need to capture every IP found in all logs, regardless of Index/host/source/sourcetype. A single weblog from a busy webserver could yield 1000's of IPs for each unique client requesting a popular webpage. I'm not concerned about Hostnames.&lt;/P&gt;

&lt;P&gt;CLARIFICATIONS TO YOUR QUESTIONS:&lt;BR /&gt;
Example is anything between 0.0.0.0 and 255.255.255.255.&lt;BR /&gt;
Regex taken from &lt;A href="http://www.regular-expressions.info/ip.html"&gt;www.regular-expressions.info/ip.html&lt;/A&gt; and verified with regex101.com&lt;/P&gt;

&lt;P&gt;The idea for "rex field=_raw" is taken from this:&lt;BR /&gt;
&lt;A href="https://answers.splunk.com/answers/656616/how-to-extract-ip-address-using-regex.html"&gt;https://answers.splunk.com/answers/656616/how-to-extract-ip-address-using-regex.html&lt;/A&gt;&lt;BR /&gt;
It is applying to every RAW event, regardless of sourcetype or log format.&lt;/P&gt;

&lt;P&gt;TESTING:&lt;BR /&gt;
I tested your pipeline "| dedup ip | sort ip | table ip" , and job-inspector shows that it actually takes longer than the single "| stats values(ip)" pipe. They yield the same results, with slightly different sort (string rather than Integer)&lt;/P&gt;</description>
      <pubDate>Fri, 01 Nov 2019 22:31:14 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/How-to-efficiently-query-all-indexes-for-a-list-of-IPs/m-p/461991#M130244</guid>
      <dc:creator>asearson</dc:creator>
      <dc:date>2019-11-01T22:31:14Z</dc:date>
    </item>
    <item>
      <title>Re: How to efficiently query all indexes for a list of IPs</title>
      <link>https://community.splunk.com/t5/Splunk-Search/How-to-efficiently-query-all-indexes-for-a-list-of-IPs/m-p/461992#M130245</link>
      <description>&lt;P&gt;sorting is a bad idea,  'sort' without '0' will truncate at the sort limit (default 10000)&lt;/P&gt;</description>
      <pubDate>Fri, 01 Nov 2019 23:03:59 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/How-to-efficiently-query-all-indexes-for-a-list-of-IPs/m-p/461992#M130245</guid>
      <dc:creator>bowesmana</dc:creator>
      <dc:date>2019-11-01T23:03:59Z</dc:date>
    </item>
    <item>
      <title>Re: How to efficiently query all indexes for a list of IPs</title>
      <link>https://community.splunk.com/t5/Splunk-Search/How-to-efficiently-query-all-indexes-for-a-list-of-IPs/m-p/461993#M130246</link>
      <description>&lt;P&gt;I'm assuming the regex is fine, as you seem happy with that, so in terms of efficiency, if this is a one-off operation, does efficiency matter?&lt;/P&gt;

&lt;P&gt;Your query is searching yesterday. Is the intention that it searches further back than that? Could you just run a backfill operation and let Splunk handle the scheduling?&lt;/P&gt;

&lt;P&gt;If you're looking for a general solution, then you could output each production index search to a CSV (outputlookup append=t) and then after running all the searches, just inputlookup the csv and stats count on the data.&lt;/P&gt;</description>
      <pubDate>Fri, 01 Nov 2019 23:12:33 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/How-to-efficiently-query-all-indexes-for-a-list-of-IPs/m-p/461993#M130246</guid>
      <dc:creator>bowesmana</dc:creator>
      <dc:date>2019-11-01T23:12:33Z</dc:date>
    </item>
  </channel>
</rss>

