<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Issue when joining results of large tstats queries over time. in Splunk Search</title>
    <link>https://community.splunk.com/t5/Splunk-Search/Issue-when-joining-results-of-large-tstats-queries-over-time/m-p/289331#M87541</link>
    <description>&lt;P&gt;Have you tried &lt;CODE&gt;tstats append=t&lt;/CODE&gt;?&lt;BR /&gt;
If that won't work for you (I am not sure about limits there), then there is only one way to losslessly join datasets, do it like this:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;| tstats values(main.is_valid) as is_valid from datamodel=validity where nodename=main AND main.is_valid=true groupby _time span=1h, main.entity
| rename main.entity as entity
| eval is_valid=if(is_valid LIKE "%true%", 1, 0)
| eval DATASET="1"
| fields _time, entity, is_valid DATASET
| appendpipe [
    | tstats count as second_events from datamodel=secondary where nodename=main groupby _time span=1h, main.entity
    | rename main.entity as entity
    | eval is_true=if(second_events &amp;gt; 0, 1, 0)
    | eval DATASET="2"
    | fields _time, entity, is_valid DATASET
]
| fillnull is_true
| stats values(*) AS * dc(DATASET) AS numDatasets BY _time entity
| rename COMMENT AS "put your join logic here: use '|search DATASET=1' for left join, '|search DATASET=2' for right join, '|search numDatasets=1' for disunion, '|search numDatasets=2 for intersection"
| fields _time, entity, is_valid, is_true
| stats sum(is_true) by entity
&lt;/CODE&gt;&lt;/PRE&gt;</description>
    <pubDate>Fri, 12 May 2017 03:34:54 GMT</pubDate>
    <dc:creator>woodcock</dc:creator>
    <dc:date>2017-05-12T03:34:54Z</dc:date>
    <item>
      <title>Issue when joining results of large tstats queries over time.</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Issue-when-joining-results-of-large-tstats-queries-over-time/m-p/289330#M87540</link>
      <description>&lt;P&gt;Hello everyone,&lt;/P&gt;

&lt;P&gt;I am currently struggling with Splunk limitations when it comes to joining two queries handling very large datasets (the two sets are two large to hold within the limits.conf limitations and I only get partial results).&lt;/P&gt;

&lt;P&gt;I am trying to find out for each entity if a first property is true for each hour, and if this first property is true, find out in these valid hours the value of a second property to finally perform its sum per entity. Each query information is present in a dedicated datamodel, which I need to join using the entity id + the time (to get the hourly result). Unfortunately, the query gives me only partial results since the first data model contains more than 2 millions events, while the second data model contains approximately 35 millions. The computation is done over a week.&lt;/P&gt;

&lt;P&gt;The result of these two queries is then summed up to provide the count of hours that contain the second property per entity.&lt;/P&gt;

&lt;P&gt;Below is what I am trying to achieve:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;| tstats values(main.is_valid) as is_valid from datamodel=validity where nodename=main AND main.is_valid=true groupby _time span=1h, main.entity
| rename main.entity as entity
| eval is_valid=if(is_valid LIKE "%true%", 1, 0)
| fields _time, entity, is_valid
| join _time, entity type=left [
    | tstats count as second_events from datamodel=secondary where nodename=main groupby _time span=1h, main.entity
    | rename main.entity as entity
    | eval is_true=if(second_events &amp;gt; 0, 1, 0)
    | fields _time, entity, is_true
]
| fillnull is_true
| fields _time, entity, is_valid, is_true
| stats sum(is_true) by entity
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;I might not be using the best possible approach or might be missing something. I would be extremely glad if you could provide me with suggestions to achieve what I am trying to do.&lt;/P&gt;

&lt;P&gt;Thank you very much!&lt;/P&gt;</description>
      <pubDate>Thu, 11 May 2017 23:28:58 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Issue-when-joining-results-of-large-tstats-queries-over-time/m-p/289330#M87540</guid>
      <dc:creator>moystard</dc:creator>
      <dc:date>2017-05-11T23:28:58Z</dc:date>
    </item>
    <item>
      <title>Re: Issue when joining results of large tstats queries over time.</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Issue-when-joining-results-of-large-tstats-queries-over-time/m-p/289331#M87541</link>
      <description>&lt;P&gt;Have you tried &lt;CODE&gt;tstats append=t&lt;/CODE&gt;?&lt;BR /&gt;
If that won't work for you (I am not sure about limits there), then there is only one way to losslessly join datasets, do it like this:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;| tstats values(main.is_valid) as is_valid from datamodel=validity where nodename=main AND main.is_valid=true groupby _time span=1h, main.entity
| rename main.entity as entity
| eval is_valid=if(is_valid LIKE "%true%", 1, 0)
| eval DATASET="1"
| fields _time, entity, is_valid DATASET
| appendpipe [
    | tstats count as second_events from datamodel=secondary where nodename=main groupby _time span=1h, main.entity
    | rename main.entity as entity
    | eval is_true=if(second_events &amp;gt; 0, 1, 0)
    | eval DATASET="2"
    | fields _time, entity, is_valid DATASET
]
| fillnull is_true
| stats values(*) AS * dc(DATASET) AS numDatasets BY _time entity
| rename COMMENT AS "put your join logic here: use '|search DATASET=1' for left join, '|search DATASET=2' for right join, '|search numDatasets=1' for disunion, '|search numDatasets=2 for intersection"
| fields _time, entity, is_valid, is_true
| stats sum(is_true) by entity
&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Fri, 12 May 2017 03:34:54 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Issue-when-joining-results-of-large-tstats-queries-over-time/m-p/289331#M87541</guid>
      <dc:creator>woodcock</dc:creator>
      <dc:date>2017-05-12T03:34:54Z</dc:date>
    </item>
    <item>
      <title>Re: Issue when joining results of large tstats queries over time.</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Issue-when-joining-results-of-large-tstats-queries-over-time/m-p/289332#M87542</link>
      <description>&lt;P&gt;Thank you very much for your answer.&lt;/P&gt;

&lt;P&gt;Unfortunately, your query does not work, when I search DATASET=2, it produces no result (it looks like only the first dataset is kept). I obviously tried to run the query separately and it outputs results properly, only when appendpipe is used that it does not work.&lt;/P&gt;

&lt;P&gt;I have also tried to use prestats like below but it does not output any other properties than _time and mount_id:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;| tstats prestats=t values(main.valid_event) as is_valid_hour from datamodel=hour_validity where nodename=main AND main.is_valid_hour=true groupby _time span=1h, main.entity 
| tstats prestats=t append=t count as true_events from datamodel=true_events where nodename=main groupby _time span=1h, main.entity 
| table _time, entity, is_valid_hour, true_events
&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Tue, 29 Sep 2020 14:02:49 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Issue-when-joining-results-of-large-tstats-queries-over-time/m-p/289332#M87542</guid>
      <dc:creator>moystard</dc:creator>
      <dc:date>2020-09-29T14:02:49Z</dc:date>
    </item>
    <item>
      <title>Re: Issue when joining results of large tstats queries over time.</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Issue-when-joining-results-of-large-tstats-queries-over-time/m-p/289333#M87543</link>
      <description>&lt;P&gt;Your situation makes no sense to me.   Are you sure that you are describing everything accurately?&lt;/P&gt;</description>
      <pubDate>Fri, 07 Jul 2017 21:17:08 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Issue-when-joining-results-of-large-tstats-queries-over-time/m-p/289333#M87543</guid>
      <dc:creator>woodcock</dc:creator>
      <dc:date>2017-07-07T21:17:08Z</dc:date>
    </item>
    <item>
      <title>Re: Issue when joining results of large tstats queries over time.</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Issue-when-joining-results-of-large-tstats-queries-over-time/m-p/289334#M87544</link>
      <description>&lt;P&gt;You might want to remove the prestats=t from the second tstats.&lt;/P&gt;</description>
      <pubDate>Tue, 18 Jul 2017 02:35:17 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Issue-when-joining-results-of-large-tstats-queries-over-time/m-p/289334#M87544</guid>
      <dc:creator>prats84</dc:creator>
      <dc:date>2017-07-18T02:35:17Z</dc:date>
    </item>
    <item>
      <title>Re: Issue when joining results of large tstats queries over time.</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Issue-when-joining-results-of-large-tstats-queries-over-time/m-p/289335#M87545</link>
      <description>&lt;P&gt;It is required when using &lt;CODE&gt;append=t&lt;/CODE&gt;.&lt;/P&gt;</description>
      <pubDate>Tue, 05 Dec 2017 05:55:54 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Issue-when-joining-results-of-large-tstats-queries-over-time/m-p/289335#M87545</guid>
      <dc:creator>woodcock</dc:creator>
      <dc:date>2017-12-05T05:55:54Z</dc:date>
    </item>
  </channel>
</rss>

