<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Alternates to join/append to avoid limitations in Splunk Search</title>
    <link>https://community.splunk.com/t5/Splunk-Search/Alternates-to-join-append-to-avoid-limitations/m-p/746786#M241713</link>
    <description>&lt;P&gt;Hi &lt;a href="https://community.splunk.com/t5/user/viewprofilepage/user-id/191500"&gt;@kaeleyt&lt;/a&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Use the Splunk lookup feature by saving Dataset 2 (ip-to-hostname mapping) as a CSV lookup file and then using the lookup command to enrich Dataset 1. This fully bypasses subsearch, join, and append limits.&lt;/P&gt;&lt;DIV&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&amp;nbsp;&lt;/DIV&gt;&lt;/DIV&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;Create CSV lookup table from Dataset 2&lt;/DIV&gt;&lt;DIV class=""&gt;&amp;nbsp;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;LI-CODE lang="markup"&gt;index=my_hosts
| stats values(hostname) as hostname by ip
| outputlookup ip_to_hostname.csv&lt;/LI-CODE&gt;&lt;DIV class=""&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV class=""&gt;Join Dataset 1 and lookup to enrich logs with hostnames&lt;/DIV&gt;&lt;DIV class=""&gt;&amp;nbsp;&lt;/DIV&gt;&lt;LI-CODE lang="markup"&gt;index=my_logs
| stats count by ip
| lookup ip_to_hostname.csv ip OUTPUT hostname
| table ip, count, hostname&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;The lookup command does not have the same limiting factors as join, append, or subsearch for reasonable file sizes, you could use either CSV or kvstore lookups.&lt;/P&gt;&lt;DIV class=""&gt;&amp;nbsp;&lt;/DIV&gt;&lt;P&gt;If Dataset 2 changes regularly you could overwrite the lookup via a scheduled search.&lt;/P&gt;&lt;P&gt;For very large lookups, Splunk recommends KV store lookups for scale, but CSV lookups generally perform well up to 1M+ rows.&lt;/P&gt;&lt;P&gt;Confirm that the field names (ip, hostname) match exactly between lookup and base data.&lt;/P&gt;&lt;P&gt;&lt;span class="lia-unicode-emoji" title=":glowing_star:"&gt;🌟&lt;/span&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;STRONG&gt;Did this answer help you?&lt;/STRONG&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;If so, please consider:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;Adding karma to show it was useful&lt;/LI&gt;&lt;LI&gt;Marking it as the solution if it resolved your issue&lt;/LI&gt;&lt;LI&gt;Commenting if you need any clarification&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;Your feedback encourages the volunteers in this community to continue contributing&lt;/P&gt;</description>
    <pubDate>Thu, 22 May 2025 21:52:43 GMT</pubDate>
    <dc:creator>livehybrid</dc:creator>
    <dc:date>2025-05-22T21:52:43Z</dc:date>
    <item>
      <title>Alternates to join/append to avoid limitations</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Alternates-to-join-append-to-avoid-limitations/m-p/746782#M241711</link>
      <description>&lt;P&gt;Situation: I have 2 data sets:&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Dataset 1&lt;/STRONG&gt; is a set of logs which includes IP addresses. When aggregated, there are 200,000+ IP addresses.&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Dataset 2&lt;/STRONG&gt; is a dataset we are pulling in once a day which includes identifying information for those IP addresses including hostname for example. This dataset is even larger.&lt;/P&gt;&lt;P&gt;I'm wanting to map the hostname from &lt;STRONG&gt;Dataset 2&lt;/STRONG&gt; to the IP address in &lt;STRONG&gt;Dataset 1&lt;/STRONG&gt;. I feel like I've tried everything (join, append + eventstats, subsearching) and unfortunately all have a limit which prevent me from getting the full set mapped.&lt;/P&gt;&lt;P&gt;&lt;A href="https://docs.splunk.com/Documentation/Splunk/9.4.2/SearchReference/Join" target="_self"&gt;Join&lt;/A&gt; limit: 50,000&lt;/P&gt;&lt;P&gt;&lt;A href="https://docs.splunk.com/Documentation/Splunk/9.4.2/SearchReference/Append" target="_self"&gt;Append&lt;/A&gt; limit: 10,000&lt;/P&gt;&lt;P&gt;&lt;A href="https://docs.splunk.com/Documentation/Splunk/9.4.2/Search/Aboutsubsearches" target="_self"&gt;Subsearch&lt;/A&gt; limit: 10,000&lt;/P&gt;&lt;P&gt;I've come across this same sort of issue before and have dropped projects because there doesn't seem to be an obvious way to get around these limits without increasing limits like the subsearch_maxout for example for our whole environment by at least 10x. I've started looking into the &lt;A href="https://docs.splunk.com/Documentation/Splunk/9.4.2/SearchReference/Map" target="_self"&gt;map command&lt;/A&gt; but the documentation seems extremely vague on the limits ("Zero ( 0 ) does not equate to unlimited searches.")&lt;/P&gt;&lt;P&gt;The only thing I've gotten to work is to essentially manually break the 2nd data source up into groups of&amp;nbsp; 10000 or less rows and append + eventstats each group of 10,000 one by one by one which is a complete nightmare of a query if you can imagine that plus, additional appends need to be created anytime the 2nd data set changes or grows.&lt;/P&gt;&lt;P&gt;I'm growing tired of not having a good way of tackling this issue so I'm seeking any advice from any fellow Splunkers that have successfully "joined" larger datasets.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Some example searches to help with the situation:&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Dataset 1&lt;/STRONG&gt; search:&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;index=my_logs
| stats count by ip&lt;/LI-CODE&gt;&lt;P&gt;&lt;STRONG&gt;&amp;nbsp;Dataset 2&lt;/STRONG&gt; search:&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;index=my_hosts
| stats values(hostname) as hostname by ip&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 22 May 2025 21:52:40 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Alternates-to-join-append-to-avoid-limitations/m-p/746782#M241711</guid>
      <dc:creator>kaeleyt</dc:creator>
      <dc:date>2025-05-22T21:52:40Z</dc:date>
    </item>
    <item>
      <title>Re: Alternates to join/append to avoid limitations</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Alternates-to-join-append-to-avoid-limitations/m-p/746785#M241712</link>
      <description>&lt;P&gt;Hi there,&lt;/P&gt;&lt;P&gt;Oldy but goldy &lt;span class="lia-unicode-emoji" title=":winking_face:"&gt;😉&lt;/span&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;A href="https://community.splunk.com/t5/Splunk-Search/How-to-compare-fields-over-multiple-sourcetypes-without-join/m-p/113477#M29849" target="_blank"&gt;https://community.splunk.com/t5/Splunk-Search/How-to-compare-fields-over-multiple-sourcetypes-without-join/m-p/113477#M29849&lt;/A&gt;&lt;/P&gt;&lt;P&gt;Hope this helps ...&lt;/P&gt;&lt;P&gt;Cheers, MuS&lt;/P&gt;</description>
      <pubDate>Thu, 22 May 2025 21:51:43 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Alternates-to-join-append-to-avoid-limitations/m-p/746785#M241712</guid>
      <dc:creator>MuS</dc:creator>
      <dc:date>2025-05-22T21:51:43Z</dc:date>
    </item>
    <item>
      <title>Re: Alternates to join/append to avoid limitations</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Alternates-to-join-append-to-avoid-limitations/m-p/746786#M241713</link>
      <description>&lt;P&gt;Hi &lt;a href="https://community.splunk.com/t5/user/viewprofilepage/user-id/191500"&gt;@kaeleyt&lt;/a&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Use the Splunk lookup feature by saving Dataset 2 (ip-to-hostname mapping) as a CSV lookup file and then using the lookup command to enrich Dataset 1. This fully bypasses subsearch, join, and append limits.&lt;/P&gt;&lt;DIV&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&amp;nbsp;&lt;/DIV&gt;&lt;/DIV&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;Create CSV lookup table from Dataset 2&lt;/DIV&gt;&lt;DIV class=""&gt;&amp;nbsp;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;LI-CODE lang="markup"&gt;index=my_hosts
| stats values(hostname) as hostname by ip
| outputlookup ip_to_hostname.csv&lt;/LI-CODE&gt;&lt;DIV class=""&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV class=""&gt;Join Dataset 1 and lookup to enrich logs with hostnames&lt;/DIV&gt;&lt;DIV class=""&gt;&amp;nbsp;&lt;/DIV&gt;&lt;LI-CODE lang="markup"&gt;index=my_logs
| stats count by ip
| lookup ip_to_hostname.csv ip OUTPUT hostname
| table ip, count, hostname&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;The lookup command does not have the same limiting factors as join, append, or subsearch for reasonable file sizes, you could use either CSV or kvstore lookups.&lt;/P&gt;&lt;DIV class=""&gt;&amp;nbsp;&lt;/DIV&gt;&lt;P&gt;If Dataset 2 changes regularly you could overwrite the lookup via a scheduled search.&lt;/P&gt;&lt;P&gt;For very large lookups, Splunk recommends KV store lookups for scale, but CSV lookups generally perform well up to 1M+ rows.&lt;/P&gt;&lt;P&gt;Confirm that the field names (ip, hostname) match exactly between lookup and base data.&lt;/P&gt;&lt;P&gt;&lt;span class="lia-unicode-emoji" title=":glowing_star:"&gt;🌟&lt;/span&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;STRONG&gt;Did this answer help you?&lt;/STRONG&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;If so, please consider:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;Adding karma to show it was useful&lt;/LI&gt;&lt;LI&gt;Marking it as the solution if it resolved your issue&lt;/LI&gt;&lt;LI&gt;Commenting if you need any clarification&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;Your feedback encourages the volunteers in this community to continue contributing&lt;/P&gt;</description>
      <pubDate>Thu, 22 May 2025 21:52:43 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Alternates-to-join-append-to-avoid-limitations/m-p/746786#M241713</guid>
      <dc:creator>livehybrid</dc:creator>
      <dc:date>2025-05-22T21:52:43Z</dc:date>
    </item>
    <item>
      <title>Re: Alternates to join/append to avoid limitations</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Alternates-to-join-append-to-avoid-limitations/m-p/746791#M241715</link>
      <description>&lt;P&gt;&lt;a href="https://community.splunk.com/t5/user/viewprofilepage/user-id/2012"&gt;@MuS&lt;/a&gt;That got me part of the way there but I think I may have accidentally oversimplified my question a bit. I'll post another question to get the 2nd half answered. Thanks for the help!&lt;/P&gt;</description>
      <pubDate>Thu, 22 May 2025 22:38:41 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Alternates-to-join-append-to-avoid-limitations/m-p/746791#M241715</guid>
      <dc:creator>kaeleyt</dc:creator>
      <dc:date>2025-05-22T22:38:41Z</dc:date>
    </item>
  </channel>
</rss>

