<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Join truncating the result. how to avoid it? in Splunk Search</title>
    <link>https://community.splunk.com/t5/Splunk-Search/Join-truncating-the-result-how-to-avoid-it/m-p/143600#M39899</link>
    <description>&lt;P&gt;Try this; it is &lt;EM&gt;vastly&lt;/EM&gt; less efficient but distributes the &lt;CODE&gt;subsearch&lt;/CODE&gt; limit out by dividing the results over the number of &lt;CODE&gt;correlation_id&lt;/CODE&gt; values:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt; index="index" tag=tag1 sourcetype=access_combined "def"|rex "(?i)/user/(?P&amp;lt;FIELDNAME&amp;gt;[^/]+)"| bucket span=1d _time |stats count by FIELDNAME _time|where count&amp;lt;30 AND count&amp;gt;20 |fields correlation_id | map search="search correlation_id=$correlation_id$ index=\"index\" tag=tag2 sourcetype=access_combined \"hello\"" | rex field=req_host "^(?&amp;lt;client&amp;gt;[^.]*)" | bucket span=1d _time |stats count by client FIELDNAME _time | sort - _time count|dedup 3 client
&lt;/CODE&gt;&lt;/PRE&gt;</description>
    <pubDate>Fri, 05 Jun 2015 17:37:35 GMT</pubDate>
    <dc:creator>woodcock</dc:creator>
    <dc:date>2015-06-05T17:37:35Z</dc:date>
    <item>
      <title>Join truncating the result. how to avoid it?</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Join-truncating-the-result-how-to-avoid-it/m-p/143599#M39898</link>
      <description>&lt;P&gt;I have the following query,&lt;/P&gt;

&lt;P&gt;index="index" tag=tag1 sourcetype=access_combined "def"|fields correlation_id|join correlation_id[search index="index" tag=tag2 sourcetype=access_combined "hello"]|rex "(?i)/user/(?P&amp;lt;FIELDNAME&amp;gt;[^/]+)" | rex field=req_host "^(?&amp;lt;client&amp;gt;[^.]*)"   | bucket span=1d _time |stats count by client FIELDNAME _time | sort - _time count|dedup 3 _time&lt;/P&gt;

&lt;P&gt;I am expecting a table that would have top 3 values for each client ,&lt;/P&gt;

&lt;P&gt;client            FIELDNAME               _time                  count&lt;/P&gt;

&lt;P&gt;This search is truncating a lot of results and I'm not able to get stats for more than a day or two.I tried,&lt;/P&gt;

&lt;P&gt;index="index" tag=tag1 sourcetype=access_combined "def"|rex "(?i)/user/(?P&amp;lt;FIELDNAME&amp;gt;[^/]+)"| bucket span=1d _time |stats count by FIELDNAME _time|where count&amp;lt;30 AND count&amp;gt;20 |fields correlation_id|join correlation_id[search index="index" tag=tag2 sourcetype=access_combined "hello"] | rex field=req_host "^(?&amp;lt;client&amp;gt;[^.]*)"   | bucket span=1d _time |stats count by client FIELDNAME _time | sort - _time count|dedup 3 client&lt;/P&gt;

&lt;P&gt;This does not work.&lt;/P&gt;

&lt;P&gt;How do I optimize the join so that I can get stats for a longer period? any idea?&lt;/P&gt;

&lt;P&gt;P.S I apologize for the formatting. It eats up the query if I try doing it.&lt;/P&gt;

&lt;P&gt;Thanks in advance&lt;/P&gt;</description>
      <pubDate>Mon, 28 Sep 2020 20:11:10 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Join-truncating-the-result-how-to-avoid-it/m-p/143599#M39898</guid>
      <dc:creator>sp1711</dc:creator>
      <dc:date>2020-09-28T20:11:10Z</dc:date>
    </item>
    <item>
      <title>Re: Join truncating the result. how to avoid it?</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Join-truncating-the-result-how-to-avoid-it/m-p/143600#M39899</link>
      <description>&lt;P&gt;Try this; it is &lt;EM&gt;vastly&lt;/EM&gt; less efficient but distributes the &lt;CODE&gt;subsearch&lt;/CODE&gt; limit out by dividing the results over the number of &lt;CODE&gt;correlation_id&lt;/CODE&gt; values:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt; index="index" tag=tag1 sourcetype=access_combined "def"|rex "(?i)/user/(?P&amp;lt;FIELDNAME&amp;gt;[^/]+)"| bucket span=1d _time |stats count by FIELDNAME _time|where count&amp;lt;30 AND count&amp;gt;20 |fields correlation_id | map search="search correlation_id=$correlation_id$ index=\"index\" tag=tag2 sourcetype=access_combined \"hello\"" | rex field=req_host "^(?&amp;lt;client&amp;gt;[^.]*)" | bucket span=1d _time |stats count by client FIELDNAME _time | sort - _time count|dedup 3 client
&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Fri, 05 Jun 2015 17:37:35 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Join-truncating-the-result-how-to-avoid-it/m-p/143600#M39899</guid>
      <dc:creator>woodcock</dc:creator>
      <dc:date>2015-06-05T17:37:35Z</dc:date>
    </item>
    <item>
      <title>Re: Join truncating the result. how to avoid it?</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Join-truncating-the-result-how-to-avoid-it/m-p/143601#M39900</link>
      <description>&lt;P&gt;Try restructuring your search to not use &lt;CODE&gt;join&lt;/CODE&gt; at all.  I think you can get what you need using &lt;CODE&gt;stats&lt;/CODE&gt; instead. Would something like this work?&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;sourcetype=access_combined (index="index" tag=tag1 "def") OR (index="index" tag=tag2 "hello")
| rex "(?i)/user/(?P&amp;lt;FIELDNAME&amp;gt;[^/]+)" 
| rex field=req_host "^(?&amp;lt;client&amp;gt;[^.]*)" 
| bucket span=1d _time 
| stats count by client FIELDNAME _time 
| sort - _time count 
| dedup 3 _time
&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Fri, 05 Jun 2015 17:45:45 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Join-truncating-the-result-how-to-avoid-it/m-p/143601#M39900</guid>
      <dc:creator>wpreston</dc:creator>
      <dc:date>2015-06-05T17:45:45Z</dc:date>
    </item>
    <item>
      <title>Re: Join truncating the result. how to avoid it?</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Join-truncating-the-result-how-to-avoid-it/m-p/143602#M39901</link>
      <description>&lt;P&gt;the regex for client doesnt work with this. It gives me a wrong result on that. And still the data is only for a day. So I get 3 rows in the table, all realted to one day.&lt;/P&gt;</description>
      <pubDate>Fri, 05 Jun 2015 17:59:01 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Join-truncating-the-result-how-to-avoid-it/m-p/143602#M39901</guid>
      <dc:creator>sp1711</dc:creator>
      <dc:date>2015-06-05T17:59:01Z</dc:date>
    </item>
    <item>
      <title>Re: Join truncating the result. how to avoid it?</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Join-truncating-the-result-how-to-avoid-it/m-p/143603#M39902</link>
      <description>&lt;P&gt;I get Error in 'map': Did not find value for required attribute 'correlation_id'.&lt;/P&gt;</description>
      <pubDate>Fri, 05 Jun 2015 18:03:59 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Join-truncating-the-result-how-to-avoid-it/m-p/143603#M39902</guid>
      <dc:creator>sp1711</dc:creator>
      <dc:date>2015-06-05T18:03:59Z</dc:date>
    </item>
    <item>
      <title>Re: Join truncating the result. how to avoid it?</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Join-truncating-the-result-how-to-avoid-it/m-p/143604#M39903</link>
      <description>&lt;P&gt;I had a typo but I fixed it in the original answer; try again.&lt;/P&gt;</description>
      <pubDate>Fri, 05 Jun 2015 18:08:21 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Join-truncating-the-result-how-to-avoid-it/m-p/143604#M39903</guid>
      <dc:creator>woodcock</dc:creator>
      <dc:date>2015-06-05T18:08:21Z</dc:date>
    </item>
    <item>
      <title>Re: Join truncating the result. how to avoid it?</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Join-truncating-the-result-how-to-avoid-it/m-p/143605#M39904</link>
      <description>&lt;P&gt;I get the same error!&lt;/P&gt;</description>
      <pubDate>Fri, 05 Jun 2015 18:13:01 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Join-truncating-the-result-how-to-avoid-it/m-p/143605#M39904</guid>
      <dc:creator>sp1711</dc:creator>
      <dc:date>2015-06-05T18:13:01Z</dc:date>
    </item>
    <item>
      <title>Re: Join truncating the result. how to avoid it?</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Join-truncating-the-result-how-to-avoid-it/m-p/143606#M39905</link>
      <description>&lt;P&gt;Then you do not have a field called &lt;CODE&gt;correlation_id&lt;/CODE&gt; as you clearly imply in your original search.  Fix that and it will work.&lt;/P&gt;</description>
      <pubDate>Fri, 05 Jun 2015 18:43:14 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Join-truncating-the-result-how-to-avoid-it/m-p/143606#M39905</guid>
      <dc:creator>woodcock</dc:creator>
      <dc:date>2015-06-05T18:43:14Z</dc:date>
    </item>
    <item>
      <title>Re: Join truncating the result. how to avoid it?</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Join-truncating-the-result-how-to-avoid-it/m-p/143607#M39906</link>
      <description>&lt;P&gt;I ran just the first part of the query and checked if it has a &lt;CODE&gt;correlation_id&lt;/CODE&gt; field. It has got it. I dont understand why the map wont work.&lt;/P&gt;</description>
      <pubDate>Fri, 05 Jun 2015 19:04:14 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Join-truncating-the-result-how-to-avoid-it/m-p/143607#M39906</guid>
      <dc:creator>sp1711</dc:creator>
      <dc:date>2015-06-05T19:04:14Z</dc:date>
    </item>
    <item>
      <title>Re: Join truncating the result. how to avoid it?</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Join-truncating-the-result-how-to-avoid-it/m-p/143608#M39907</link>
      <description>&lt;P&gt;Try the same search but get rid of the string &lt;CODE&gt;|fields correlation_id&lt;/CODE&gt; and see if that helps.&lt;/P&gt;</description>
      <pubDate>Fri, 05 Jun 2015 19:13:34 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Join-truncating-the-result-how-to-avoid-it/m-p/143608#M39907</guid>
      <dc:creator>woodcock</dc:creator>
      <dc:date>2015-06-05T19:13:34Z</dc:date>
    </item>
    <item>
      <title>Re: Join truncating the result. how to avoid it?</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Join-truncating-the-result-how-to-avoid-it/m-p/143609#M39908</link>
      <description>&lt;P&gt;One big problem is that right before your join you are doing &lt;CODE&gt;fields correlation_id&lt;/CODE&gt; and thus throwing away all fields except for correlation_id.  This will allow your join to work but all other subsequent search commands will lose the fields they expect (eg req_host, client, FIELDNAME) .  Take that fields clause out.    In general you don't have to worry about Splunk extracting and preserving too many fields.  In fact the search parser does the opposite and optimizes the fields extracted and preserved, down to just fields named in the search string. &lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;index="index" tag=tag1 sourcetype=access_combined "def"|rex "(?i)/user/(?P&amp;lt;FIELDNAME&amp;gt;[^/]+)"| bucket span=1d _time |stats count by FIELDNAME _time|where count&amp;lt;30 AND count&amp;gt;20 |join correlation_id[search index="index" tag=tag2 sourcetype=access_combined "hello"] | rex field=req_host "^(?&amp;lt;client&amp;gt;[^.]*)" | bucket span=1d _time |stats count by client FIELDNAME _time | sort - _time count|dedup 3 client
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;However I also strongly recommend using stats instead of join.  You should absolutely avoid join for this,  and it's a bread and butter case for the use of stats. &lt;/P&gt;

&lt;P&gt;I think the reason wpreston's search doesn't work for you is that your stats needs to group by correlation_id.  You want something more like this: &lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;index="index" sourcetype=access_combined  (tag=tag1 "def") OR (tag=tag2 "hello") 
| rex "(?i)/user/(?P&amp;lt;FIELDNAME&amp;gt;[^/]+)"
| rex field=req_host "^(?&amp;lt;client&amp;gt;[^.]*)" 
| stats values(FIELDNAME) as FIELDNAME values(req_host) as req_host) values(client) as client last(_time) as _time by correlation_id
| bucket span=1d _time 
| stats count by FIELDNAME _time
|where count&amp;lt;30 AND count&amp;gt;20 
|stats count by client FIELDNAME _time 
| sort - _time count
|dedup 3 client
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;One other note - the rows coming into your second stats command already have a &lt;CODE&gt;count&lt;/CODE&gt; field, so you may want &lt;CODE&gt;sum(count) as count&lt;/CODE&gt; instead of just another &lt;CODE&gt;count&lt;/CODE&gt;.  Depends what you want to count of course, but I'd make sure you're clear on what you're getting. &lt;/P&gt;</description>
      <pubDate>Mon, 28 Sep 2020 20:08:41 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Join-truncating-the-result-how-to-avoid-it/m-p/143609#M39908</guid>
      <dc:creator>sideview</dc:creator>
      <dc:date>2020-09-28T20:08:41Z</dc:date>
    </item>
  </channel>
</rss>

