<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: What causes unioned data sets to be truncated? in Splunk Search</title>
    <link>https://community.splunk.com/t5/Splunk-Search/What-causes-unioned-data-sets-to-be-truncated/m-p/357739#M165347</link>
    <description>&lt;P&gt;Hello and thanks for this. I really appreciate you taking the time to answer.&lt;/P&gt;

&lt;P&gt;..j &lt;/P&gt;</description>
    <pubDate>Fri, 10 Nov 2017 15:59:44 GMT</pubDate>
    <dc:creator>jsinnott_</dc:creator>
    <dc:date>2017-11-10T15:59:44Z</dc:date>
    <item>
      <title>What causes unioned data sets to be truncated?</title>
      <link>https://community.splunk.com/t5/Splunk-Search/What-causes-unioned-data-sets-to-be-truncated/m-p/357735#M165343</link>
      <description>&lt;P&gt;Hi Splunk Experts--&lt;/P&gt;

&lt;P&gt;I'm confused about the union command and am hoping you can&lt;BR /&gt;
help. Specifically, I'm struggling to understand what causes the&lt;BR /&gt;
"things that get unioned" to be truncated-- in my case to 50,000&lt;BR /&gt;
records.&lt;/P&gt;

&lt;P&gt;Here's an example of what confuses me:&lt;/P&gt;

&lt;P&gt;Imagine three sets of data-- I've put them in three separate indexes&lt;BR /&gt;
called union_1, union_2 and union 3. The data sets are very similar:&lt;BR /&gt;
each has 60,000 records, each consisting of a timestamp, a color and a&lt;BR /&gt;
hash. Each data set has exactly one event per second and each covers&lt;BR /&gt;
the same 60,000 seconds (from 2017-01-01 00:00:01 to 2017-01-01&lt;BR /&gt;
16:40:00). The color is random and the hash is unique across all&lt;BR /&gt;
180,000 events (60,000 * three data sets).&lt;/P&gt;

&lt;P&gt;Here's union_1:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;time                       color   hash
-------------------------  ------  --------------------------------
2017-01-01 00:00:01 -0800  blue    08decd051408e648b941b5dbb9b1578c
2017-01-01 00:00:02 -0800  yellow  39d98f7f9a98920ee08631c9e6a4e867
2017-01-01 00:00:03 -0800  green   2b34449aae3a941c64dd76d33a6cfc04
...
2017-01-01 16:39:58 -0800  blue    b2cc43ab839bf57711a00f8f7a622e97
2017-01-01 16:39:59 -0800  blue    e26f577b10d0fa172c122deca813d38f
2017-01-01 16:40:00 -0800  blue    c9b0b55e7513963f7b04cf3c424686f2
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;...and union_2:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;time                       color   hash
-------------------------  ------  --------------------------------
2017-01-01 00:00:01 -0800  violet  c8e68d6c154fc0ca88220a299dba7c55
2017-01-01 00:00:02 -0800  blue    3e18602a1d137ea4bf9157e67c4386ed
2017-01-01 00:00:03 -0800  violet  ecdf61cd34cda950bd782e3a6ba51fd6
...
2017-01-01 16:39:58 -0800  violet  5c00f68da1aa343ec0944fbcd42775fc
2017-01-01 16:39:59 -0800  green   2c3a626ff26a05f9895dc1c9ae1d074e
2017-01-01 16:40:00 -0800  red     9b796de25b072d8a48d3e9a7a716c4e9
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;...and union_3:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;time                       color   hash
-------------------------  ------  --------------------------------
2017-01-01 00:00:01 -0800  orange  772468eb812735bfa984b91477afe967
2017-01-01 00:00:02 -0800  violet  6d9ebc2ce8b1c79d42793d624daeb402
2017-01-01 00:00:03 -0800  red     a31d8811b95b4597f943f268f4068fb0
...
2017-01-01 16:39:58 -0800  yellow  17b43d58e4920f1d2044552acdad5507
2017-01-01 16:39:59 -0800  violet  12425e908448371c38a1f0fe12aedf73
2017-01-01 16:40:00 -0800  indigo  ea1fb54c5c2b5fd2161856ea6937226e
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;You get the idea... &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt;&lt;/P&gt;

&lt;P&gt;Now let's run some SPL:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;| union maxout=10000000
  [ search index=union_1 ]
  [ search index=union_2 ]
  [ search index=union_3 ]
| stats count by index
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;This produces what I'd expect-- 60,000 records per "thing that got&lt;BR /&gt;
unioned":&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;index    count
-------  -----
union_1  60000
union_2  60000
union_3  60000
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;But let's make things a bit more complicated:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;| union maxout=10000000
  [ search index=union_1 | head 60000 ]
  [ search index=union_2 ]
  [ search index=union_3 ]
| stats count by index
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;Wait, what? Adding a head command to the first search causes the&lt;BR /&gt;
second and third to be truncated to 50000?&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;index    count
-------  -----
union_1  60000
union_2  50000
union_3  50000
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;How about this one?&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;| union maxout=10000000
  [ search index=union_1 ]
  [ search index=union_2 | head 60000 ]
  [ search index=union_3 ]
| stats count by index
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;Hmmm... same result:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;index    count
-------  -----
union_1  60000
union_2  50000
union_3  50000
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;What if we move the head command to the final search?&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;| union maxout=10000000
  [ search index=union_1 ]
  [ search index=union_2 ]
  [ search index=union_3 | head 60000 ]
| stats count by index
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;Wow... now only the final search gets truncated:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;index    count
-------  -----
union_1  60000
union_2  60000
union_3  50000
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;Notes that may or may not be relevant:&lt;/P&gt;

&lt;UL&gt;
&lt;LI&gt;&lt;P&gt;Many commands have a similar effect (i.e. cause the same&lt;BR /&gt;
truncations) as head-- in particular dedup and sort seem to cause&lt;BR /&gt;
the same problems.&lt;/P&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;P&gt;I suspect that these commands (and presumably many others) cause&lt;BR /&gt;
the subsearch to no longer qualify as a "streaming subsearch"--&lt;BR /&gt;
(although honestly I can't imagine why head would do this) and&lt;BR /&gt;
that this fact makes union behave much more like append.&lt;/P&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;P&gt;I believe (but am not sure) that the 50000 truncation limit is due&lt;BR /&gt;
to maxresultrows in limits.conf-- that value (for me is currently&lt;BR /&gt;
50000)&lt;/P&gt;&lt;/LI&gt;
&lt;/UL&gt;

&lt;P&gt;For context, here's what I want to do:&lt;/P&gt;

&lt;UL&gt;
&lt;LI&gt;&lt;P&gt;In general, get a better understanding of how union works and how&lt;BR /&gt;
its different than append.&lt;/P&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;P&gt;Specifically, union a set of three searches that each produce substantially more&lt;BR /&gt;
than 50000 records and not experience truncation.&lt;/P&gt;&lt;/LI&gt;
&lt;/UL&gt;

&lt;P&gt;Anybody willing to help me out with this? Would totally appreciate the&lt;BR /&gt;
benefit of your wisdom &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt;&lt;/P&gt;

&lt;P&gt;Thanks!&lt;/P&gt;</description>
      <pubDate>Tue, 29 Sep 2020 16:40:41 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/What-causes-unioned-data-sets-to-be-truncated/m-p/357735#M165343</guid>
      <dc:creator>jsinnott_</dc:creator>
      <dc:date>2020-09-29T16:40:41Z</dc:date>
    </item>
    <item>
      <title>Re: What causes unioned data sets to be truncated?</title>
      <link>https://community.splunk.com/t5/Splunk-Search/What-causes-unioned-data-sets-to-be-truncated/m-p/357736#M165344</link>
      <description>&lt;P&gt;Hi jsinnott_,&lt;/P&gt;

&lt;P&gt;since &lt;CODE&gt;union&lt;/CODE&gt; is just another sub search you will hit many limits with it, some are mentioned here &lt;A href="http://docs.splunk.com/Documentation/Splunk/latest/SearchReference/Union#Optional_arguments"&gt;http://docs.splunk.com/Documentation/Splunk/latest/SearchReference/Union#Optional_arguments&lt;/A&gt; &lt;/P&gt;

&lt;P&gt;In most cases you can just use &lt;CODE&gt;stats&lt;/CODE&gt; to do the same and will not hit any limits. Read some examples here &lt;A href="https://answers.splunk.com/answers/129424/how-to-compare-fields-over-multiple-sourcetypes-without-join-append-or-use-of-subsearches.html"&gt;https://answers.splunk.com/answers/129424/how-to-compare-fields-over-multiple-sourcetypes-without-join-append-or-use-of-subsearches.html&lt;/A&gt; or in the March 2016 Virtual .conf session here &lt;A href="http://wiki.splunk.com/Virtual_.conf"&gt;http://wiki.splunk.com/Virtual_.conf&lt;/A&gt;&lt;/P&gt;

&lt;P&gt;Why &lt;CODE&gt;union&lt;/CODE&gt; is truncating events from a second search after using more commands sounds weird and might be worth opening a bug report.&lt;/P&gt;

&lt;P&gt;Hope this helps ...&lt;/P&gt;

&lt;P&gt;cheers, MuS&lt;/P&gt;</description>
      <pubDate>Thu, 09 Nov 2017 01:51:49 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/What-causes-unioned-data-sets-to-be-truncated/m-p/357736#M165344</guid>
      <dc:creator>MuS</dc:creator>
      <dc:date>2017-11-09T01:51:49Z</dc:date>
    </item>
    <item>
      <title>Re: What causes unioned data sets to be truncated?</title>
      <link>https://community.splunk.com/t5/Splunk-Search/What-causes-unioned-data-sets-to-be-truncated/m-p/357737#M165345</link>
      <description>&lt;P&gt;Hi jsinnott_&lt;/P&gt;

&lt;P&gt;At this time, &lt;CODE&gt;union&lt;/CODE&gt; behaves alternately like &lt;CODE&gt;multisearch&lt;/CODE&gt; (for distributable streaming subsearches) or &lt;CODE&gt;append&lt;/CODE&gt; (for subsearches that are not distributable streaming). This is not adequately explained in &lt;A href="http://docs.splunk.com/Documentation/Splunk/latest/SearchReference/Union"&gt;the doc topic for the union command&lt;/A&gt; at present and I'll see what I can do to fix that. &lt;/P&gt;

&lt;P&gt;(For more information about the types of streaming search commands, see &lt;A href="http://docs.splunk.com/Documentation/Splunk/latest/SearchReference/Commandsbytype"&gt;Command types&lt;/A&gt; in the Splunk Enterprise &lt;EM&gt;Search Manual&lt;/EM&gt;.)&lt;/P&gt;

&lt;P&gt;Let's take your first search:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;| union maxout=10000000
  [ search index=union_1 ]
  [ search index=union_2 ]
  [ search index=union_3 ]
| stats count by index
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;In this case, all of the searches are distributable streaming, so they area all unioned with &lt;CODE&gt;multisearch&lt;/CODE&gt;. This is why you see 60k in each.&lt;/P&gt;

&lt;P&gt;Your second search uses the &lt;CODE&gt;head&lt;/CODE&gt; command for one of the subsearches. Because &lt;CODE&gt;head&lt;/CODE&gt; is centralized streaming rather than distributable streaming, it causes the subsearches that follow it to use the &lt;CODE&gt;append&lt;/CODE&gt; command. "Under the hood," the search is converted to:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;| search index=union_1
| head 60000
| append 
 [ search index=union_2 ]
| append
 [ search index=union_3 ]
| stats count by index
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;When &lt;CODE&gt;union&lt;/CODE&gt; is used in conjunction with a search that is not distributable streaming, the default for the &lt;CODE&gt;maxout&lt;/CODE&gt; argument applies: 50k events. This is mentioned in &lt;A href="http://docs.splunk.com/Documentation/Splunk/latest/SearchReference/Union#Optional_arguments"&gt;the doc topic for the union command&lt;/A&gt;. &lt;/P&gt;

&lt;P&gt;Your third search also ends up being an &lt;CODE&gt;append&lt;/CODE&gt; search, because the second subsearch is not distributable streaming due to the &lt;CODE&gt;head&lt;/CODE&gt; command. Here's how it looks "under the hood":&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;| search index=union_1
| append 
 [ search index=union_2 | head 60000 ]
| append
 [ search index=union_3 ]
| stats count by index
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;Again, the &lt;CODE&gt;maxout&lt;/CODE&gt;argument default applies here, limiting the results of the appended searches to 50k events.&lt;/P&gt;

&lt;P&gt;In your last example, the first two subsearches are distributable streaming, so they are unioned with &lt;CODE&gt;multisearch&lt;/CODE&gt;. But the final subsearch has the &lt;CODE&gt;head&lt;/CODE&gt; command, so it gets unioned with &lt;CODE&gt;append&lt;/CODE&gt; at the end. &lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;| multisearch 
 [ search index=union_1 ]
 [ search index=union_2 ]| 
| append
 [ search index=union_3 | head 60000 ]
| stats count by index
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;The &lt;CODE&gt;maxout&lt;/CODE&gt; argument applies to that last subsearch because it is not distributable streaming due to the &lt;CODE&gt;head&lt;/CODE&gt; command. So it returns 50k events rather than 60k events.&lt;/P&gt;

&lt;P&gt;Note that &lt;CODE&gt;multisearch&lt;/CODE&gt; has to be the first command. If your &lt;CODE&gt;union&lt;/CODE&gt; search unpacks in a way that puts &lt;CODE&gt;append&lt;/CODE&gt; first, you won't get &lt;CODE&gt;multisearch&lt;/CODE&gt; to follow it. &lt;/P&gt;

&lt;P&gt;Kindest regards, &lt;BR /&gt;
Matt (Splunk Docs Team)&lt;/P&gt;</description>
      <pubDate>Fri, 10 Nov 2017 02:37:30 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/What-causes-unioned-data-sets-to-be-truncated/m-p/357737#M165345</guid>
      <dc:creator>mattness</dc:creator>
      <dc:date>2017-11-10T02:37:30Z</dc:date>
    </item>
    <item>
      <title>Re: What causes unioned data sets to be truncated?</title>
      <link>https://community.splunk.com/t5/Splunk-Search/What-causes-unioned-data-sets-to-be-truncated/m-p/357738#M165346</link>
      <description>&lt;P&gt;Hi Matt--&lt;/P&gt;

&lt;P&gt;Thanks so much for taking time to write this clear and detailed explanation. It's exactly what I needed-- you're my new best friend!&lt;/P&gt;

&lt;P&gt;..j&lt;/P&gt;</description>
      <pubDate>Fri, 10 Nov 2017 15:57:20 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/What-causes-unioned-data-sets-to-be-truncated/m-p/357738#M165346</guid>
      <dc:creator>jsinnott_</dc:creator>
      <dc:date>2017-11-10T15:57:20Z</dc:date>
    </item>
    <item>
      <title>Re: What causes unioned data sets to be truncated?</title>
      <link>https://community.splunk.com/t5/Splunk-Search/What-causes-unioned-data-sets-to-be-truncated/m-p/357739#M165347</link>
      <description>&lt;P&gt;Hello and thanks for this. I really appreciate you taking the time to answer.&lt;/P&gt;

&lt;P&gt;..j &lt;/P&gt;</description>
      <pubDate>Fri, 10 Nov 2017 15:59:44 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/What-causes-unioned-data-sets-to-be-truncated/m-p/357739#M165347</guid>
      <dc:creator>jsinnott_</dc:creator>
      <dc:date>2017-11-10T15:59:44Z</dc:date>
    </item>
  </channel>
</rss>

