<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: How to Filter over 50K on different indexes/sources? in Splunk Search</title>
    <link>https://community.splunk.com/t5/Splunk-Search/How-to-Filter-over-50K-on-different-indexes-sources/m-p/688674#M234775</link>
    <description>&lt;P&gt;Regardless of whether you mean the loadjob as some form of batch ingesting events or an actual invocation of Splunk's loadjob command, the typical approach to filtering events by the contents of a lookup is to use a lookup to assign a field value and then filter on that value. This way you'll get only those events that do have wanted values.&lt;/P&gt;&lt;P&gt;Keep in mind thought that:&lt;/P&gt;&lt;P&gt;1) You still need to read all matching "before lookup" events so if you're filtering to a very small subset of events, another approach might be better.&lt;/P&gt;&lt;P&gt;2) If your lookup is big, indeed moving to KVstore can be the thing to do.&lt;/P&gt;&lt;P&gt;Anyway, this is the approach:&lt;/P&gt;&lt;PRE&gt;&amp;lt;your initial search&amp;gt;&lt;BR /&gt;| lookup mylookup.csv lookupfield AS eventfield OUTPUT lookupfield AS somefieldwewanttofilterby&lt;BR /&gt;| where isnotnull(somefieldwewanttofilterby)&lt;/PRE&gt;</description>
    <pubDate>Sat, 25 May 2024 15:55:27 GMT</pubDate>
    <dc:creator>PickleRick</dc:creator>
    <dc:date>2024-05-25T15:55:27Z</dc:date>
    <item>
      <title>How to Filter over 50K on different indexes/sources?</title>
      <link>https://community.splunk.com/t5/Splunk-Search/How-to-Filter-over-50K-on-different-indexes-sources/m-p/688506#M234729</link>
      <description>&lt;P&gt;So, I have a loadjob with all the data I need with a primary field (account number). But, I have a CSV with about 104K account number that they only want in this report. How do I filter only 104K account numbers in this load job?&amp;nbsp; I don't have access to admin to change the join limit... Can Lookups do the job? I also don't want the values to grouped together in each row... I just want to remove the account numbers that are not on the csv from the loadjob...&amp;nbsp;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 23 May 2024 19:19:41 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/How-to-Filter-over-50K-on-different-indexes-sources/m-p/688506#M234729</guid>
      <dc:creator>sumarri</dc:creator>
      <dc:date>2024-05-23T19:19:41Z</dc:date>
    </item>
    <item>
      <title>Re: How to Filter over 50K on different indexes/sources?</title>
      <link>https://community.splunk.com/t5/Splunk-Search/How-to-Filter-over-50K-on-different-indexes-sources/m-p/688552#M234731</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.splunk.com/t5/user/viewprofilepage/user-id/265563"&gt;@sumarri&lt;/a&gt;,&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;I created a dummy search to mock up your data, and created a lookup with 104,000 entries:&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;| makeresults count=140000
| streamstats count as id 
| eval account="account" . substr("000000000".tostring(id),-6), keep="true"
| table account, keep
| outputlookup "accounts_to_keep.csv"&lt;/LI-CODE&gt;&lt;P&gt;This will be our lookup file, replicating what you have in your lookup. It has the account ID and a "keep" field, and that's it.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Next, I created a dummy search to generate a bunch of data, with accounts we don't care about and the 104,000 we do care about:&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;| makeresults count=200000
| streamstats count as id 
| eval account="account" . substr("000000000".tostring(id),-6)
| eval data=random()%10000, label="whatever", _time=relative_time(now(), "-" + tostring(random()%1000) + "m")
| table account, data, label, _time&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;To use the lookup to identify the accounts we want to keep you can use this SPL:&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;| inputlookup accounts_to_keep.csv append=t
``` use eventstats if stats messes up your data 
| eventstats values(keep) as keep by account
```
| stats values(*) as * by account
| search keep="true"
| fields - keep&lt;/LI-CODE&gt;&lt;OL&gt;&lt;LI&gt;This add the contents of the lookup to the results (&lt;EM&gt;append=t&lt;/EM&gt;)&lt;/LI&gt;&lt;LI&gt;Then we use stats to combine the keep field with the events in the search&lt;BR /&gt;If this messes up your data, you can run eventstats instead, but that may run into memory issues with massive result sets.&lt;/LI&gt;&lt;LI&gt;Finally, we search for all the events where the keep field is set to "true"&lt;/LI&gt;&lt;/OL&gt;&lt;P&gt;Depending on how big your lookup gets, you may want to make the lookup a KV store collection.&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 24 May 2024 05:24:33 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/How-to-Filter-over-50K-on-different-indexes-sources/m-p/688552#M234731</guid>
      <dc:creator>danspav</dc:creator>
      <dc:date>2024-05-24T05:24:33Z</dc:date>
    </item>
    <item>
      <title>Re: How to Filter over 50K on different indexes/sources?</title>
      <link>https://community.splunk.com/t5/Splunk-Search/How-to-Filter-over-50K-on-different-indexes-sources/m-p/688596#M234751</link>
      <description>&lt;P&gt;This logic makes sense, however, I will not get the other fields such as&amp;nbsp;data, label, _time. I need those fields populated with the correct information. But thank you for your help.&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 24 May 2024 12:00:11 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/How-to-Filter-over-50K-on-different-indexes-sources/m-p/688596#M234751</guid>
      <dc:creator>sumarri</dc:creator>
      <dc:date>2024-05-24T12:00:11Z</dc:date>
    </item>
    <item>
      <title>Re: How to Filter over 50K on different indexes/sources?</title>
      <link>https://community.splunk.com/t5/Splunk-Search/How-to-Filter-over-50K-on-different-indexes-sources/m-p/688637#M234763</link>
      <description>&lt;P&gt;Can you explain what a "loadjob" is? Normally, if data is already ingested, and you have this lookup file, all you need to do is a subsearch&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;index=myindex sourcetype=mysourcetype
  [inputlookup mylookup
  | fields account]&lt;/LI-CODE&gt;&lt;P&gt;If you are trying to filter before ingestion, Splunk cannot really do anything.&lt;/P&gt;</description>
      <pubDate>Fri, 24 May 2024 18:15:44 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/How-to-Filter-over-50K-on-different-indexes-sources/m-p/688637#M234763</guid>
      <dc:creator>yuanliu</dc:creator>
      <dc:date>2024-05-24T18:15:44Z</dc:date>
    </item>
    <item>
      <title>Re: How to Filter over 50K on different indexes/sources?</title>
      <link>https://community.splunk.com/t5/Splunk-Search/How-to-Filter-over-50K-on-different-indexes-sources/m-p/688640#M234764</link>
      <description>&lt;P&gt;A loadjob is results of&amp;nbsp;&amp;nbsp;&lt;SPAN&gt;previously completed search job from my reports created. I am trying to filter after the ingestion. I have all the data there, I just need to some account numbers, and I don't want to break the data into multiple files to get all the data needed... hence why I asked.&amp;nbsp;&lt;BR /&gt;&lt;BR /&gt;You way would work, but I only have a 50K join limit, so I will not get all the results. I need all 104K to pass through this subsearch.&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 24 May 2024 18:30:20 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/How-to-Filter-over-50K-on-different-indexes-sources/m-p/688640#M234764</guid>
      <dc:creator>sumarri</dc:creator>
      <dc:date>2024-05-24T18:30:20Z</dc:date>
    </item>
    <item>
      <title>Re: How to Filter over 50K on different indexes/sources?</title>
      <link>https://community.splunk.com/t5/Splunk-Search/How-to-Filter-over-50K-on-different-indexes-sources/m-p/688650#M234769</link>
      <description>&lt;P&gt;Your original idea of lookup should work. &amp;nbsp;Assuming your loadjob gives you a field named account_number, and that your lookup has a column account_number, you can do this&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;``` search above gives account_number and other fields ```
| lookup mylookup account_number output account_number as match_account
| where account_number == match_account&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Is this something you are looking for?&lt;/P&gt;</description>
      <pubDate>Fri, 24 May 2024 21:24:47 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/How-to-Filter-over-50K-on-different-indexes-sources/m-p/688650#M234769</guid>
      <dc:creator>yuanliu</dc:creator>
      <dc:date>2024-05-24T21:24:47Z</dc:date>
    </item>
    <item>
      <title>Re: How to Filter over 50K on different indexes/sources?</title>
      <link>https://community.splunk.com/t5/Splunk-Search/How-to-Filter-over-50K-on-different-indexes-sources/m-p/688674#M234775</link>
      <description>&lt;P&gt;Regardless of whether you mean the loadjob as some form of batch ingesting events or an actual invocation of Splunk's loadjob command, the typical approach to filtering events by the contents of a lookup is to use a lookup to assign a field value and then filter on that value. This way you'll get only those events that do have wanted values.&lt;/P&gt;&lt;P&gt;Keep in mind thought that:&lt;/P&gt;&lt;P&gt;1) You still need to read all matching "before lookup" events so if you're filtering to a very small subset of events, another approach might be better.&lt;/P&gt;&lt;P&gt;2) If your lookup is big, indeed moving to KVstore can be the thing to do.&lt;/P&gt;&lt;P&gt;Anyway, this is the approach:&lt;/P&gt;&lt;PRE&gt;&amp;lt;your initial search&amp;gt;&lt;BR /&gt;| lookup mylookup.csv lookupfield AS eventfield OUTPUT lookupfield AS somefieldwewanttofilterby&lt;BR /&gt;| where isnotnull(somefieldwewanttofilterby)&lt;/PRE&gt;</description>
      <pubDate>Sat, 25 May 2024 15:55:27 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/How-to-Filter-over-50K-on-different-indexes-sources/m-p/688674#M234775</guid>
      <dc:creator>PickleRick</dc:creator>
      <dc:date>2024-05-25T15:55:27Z</dc:date>
    </item>
    <item>
      <title>Re: How to Filter over 50K on different indexes/sources?</title>
      <link>https://community.splunk.com/t5/Splunk-Search/How-to-Filter-over-50K-on-different-indexes-sources/m-p/688880#M234811</link>
      <description>&lt;P&gt;Yes, it is! Thank you so much! I truly appreciate this!&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 28 May 2024 12:39:02 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/How-to-Filter-over-50K-on-different-indexes-sources/m-p/688880#M234811</guid>
      <dc:creator>sumarri</dc:creator>
      <dc:date>2024-05-28T12:39:02Z</dc:date>
    </item>
    <item>
      <title>Re: How to Filter over 50K on different indexes/sources?</title>
      <link>https://community.splunk.com/t5/Splunk-Search/How-to-Filter-over-50K-on-different-indexes-sources/m-p/689213#M234885</link>
      <description>&lt;P&gt;at the send of your query add "0" after a sort function:&lt;BR /&gt;&lt;BR /&gt;Example:&lt;BR /&gt;&lt;BR /&gt;....&lt;/P&gt;
&lt;LI-CODE lang="markup"&gt;| table _time,  accountnumber, field1, etc.
| sort 0 
&lt;/LI-CODE&gt;
&lt;P&gt;&lt;BR /&gt;&lt;A href="https://docs.splunk.com/Documentation/Splunk/latest/SearchReference/Sort" target="_blank" rel="noopener"&gt;https://docs.splunk.com/Documentation/Splunk/latest/SearchReference/Sort&lt;/A&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 31 May 2024 17:08:09 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/How-to-Filter-over-50K-on-different-indexes-sources/m-p/689213#M234885</guid>
      <dc:creator>antoniolamonica</dc:creator>
      <dc:date>2024-05-31T17:08:09Z</dc:date>
    </item>
  </channel>
</rss>

