<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: How to Join entries for a summary index in Splunk Search</title>
    <link>https://community.splunk.com/t5/Splunk-Search/How-to-Join-entries-for-a-summary-index/m-p/368159#M108511</link>
    <description>&lt;P&gt;Thanks for the response.  You inspired me to look into using stats more instead of a join and that proved very effective and fast.&lt;/P&gt;</description>
    <pubDate>Fri, 29 Sep 2017 22:53:35 GMT</pubDate>
    <dc:creator>wayn23</dc:creator>
    <dc:date>2017-09-29T22:53:35Z</dc:date>
    <item>
      <title>How to Join entries for a summary index</title>
      <link>https://community.splunk.com/t5/Splunk-Search/How-to-Join-entries-for-a-summary-index/m-p/368157#M108509</link>
      <description>&lt;P&gt;I have two indexes that I want to create a summary from every hour.  &lt;/P&gt;

&lt;P&gt;Index1 &lt;BR /&gt;
request_type, request_guid,  request_timestamp, meta_field1, meta_field2, ...&lt;/P&gt;

&lt;P&gt;Index1 contains log entries from each processing steps in each service request.  Each service request is assigned a unique request_guid and all ~10 logs for the processing of a request have that id.  The time the request was made is stored in request_timestamp and also remains the same through all logs for a request..  &lt;/P&gt;

&lt;P&gt;index2&lt;BR /&gt;
request_guid, meta_fieldA, meta_fieldB, ...&lt;/P&gt;

&lt;P&gt;index2 contains more data for the logs, but is in a separate index so that it can be secured differently from index1.  The request_guid is the same value as in index1&lt;/P&gt;

&lt;P&gt;I want to summarize by collecting stats for each request type by hour.&lt;/P&gt;

&lt;P&gt;The approach I have taken is to select all the logs from Index1 where the request_timestamp is in the hour. I cannot use the log time directly as a request logs might span into the next hour ( as in started at 9:59:59 and ended at 10:00:01)&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt; index=index1 earliest=0 
       | addinfo | eval timemillis=strftime(strptime(request_timestamp,"%Y-%m-%dT%H:%M:%S.%3N%z"),"%s") 
       | where timemillis&amp;gt;=info_min_time AND timemillis&amp;lt;info_max_time
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;This gives me a list of all the logs from index1 that I want to summarize.  I then join the logs from index2&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;| join  usetime=false type=left overwrite=false request_guid  max=0
  [search index=index2]
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;I then create a summary with the following&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;| sort _time desc
| transaction request_guid  maxspan=10m mvlist=true
| stats 
    count as requests_in_period
    by request_type
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;What results is elevated counts per period.  They all are multiples of 2, which tells me that a cross product seems to be happening.  I think the problem is in the join command.  There are multiple entries selected from index1 that have duplicate values for request_guid that I think are over selecting from index2, yet I want all the matching entries from index2 with the request_guid.&lt;/P&gt;

&lt;P&gt;When I view the events right after the join, all the logs are from index1 and none are from index2&lt;/P&gt;

&lt;P&gt;&lt;STRONG&gt;Background&lt;/STRONG&gt;&lt;BR /&gt;
I started with a subselect of the request_guids that should be summarized, but a subselect is limited to 10500 entries, which will not be enough for an hours of transactions logs.&lt;/P&gt;</description>
      <pubDate>Tue, 29 Sep 2020 16:02:44 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/How-to-Join-entries-for-a-summary-index/m-p/368157#M108509</guid>
      <dc:creator>wayn23</dc:creator>
      <dc:date>2020-09-29T16:02:44Z</dc:date>
    </item>
    <item>
      <title>Re: How to Join entries for a summary index</title>
      <link>https://community.splunk.com/t5/Splunk-Search/How-to-Join-entries-for-a-summary-index/m-p/368158#M108510</link>
      <description>&lt;P&gt;Try something like this. This would run every 30 minutes, at x:15 and x:45. We're summing at the 15m level, and doing so for two periods, -45m to -30m and -30m to -15m past.  &lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;earliest=-45m@m  latest=@m index=index1 OR index=index2 

| rename COMMENT as "get rid of all unneeded fields"
| fields index request_guid request_timestamp, request_type

| rename COMMENT as "calculate desired timestamp from the index1 record, roll it and the request type from the index2 record onto the index1 records for each guid"
| eval mytime=strptime(request_timestamp,"%Y-%m-%dT%H:%M:%S.%3N%z")
| eventstats min(mytime) as mytime min(request_type) as mytype by request_guid 

| rename COMMENT as "get rid of unneeded index2 records, and records that are too old or too young to have settled"
| addinfo
| where (mytime&amp;gt;=info_min_time) AND (mytime&amp;lt;info_max_time-900) AND (index="index1") 

| rename COMMENT as "chunk up the records into 15 minute chunks, stats together all records for each guid, then find out how many unique guids there were in each period."
| eval _time=900*floor(mytime/900) 
| stats  min(mytype) as request_type by _time request_guid
| stats dc(request_guid) as requests_in_period by _time request_type
&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Thu, 28 Sep 2017 05:16:51 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/How-to-Join-entries-for-a-summary-index/m-p/368158#M108510</guid>
      <dc:creator>DalJeanis</dc:creator>
      <dc:date>2017-09-28T05:16:51Z</dc:date>
    </item>
    <item>
      <title>Re: How to Join entries for a summary index</title>
      <link>https://community.splunk.com/t5/Splunk-Search/How-to-Join-entries-for-a-summary-index/m-p/368159#M108511</link>
      <description>&lt;P&gt;Thanks for the response.  You inspired me to look into using stats more instead of a join and that proved very effective and fast.&lt;/P&gt;</description>
      <pubDate>Fri, 29 Sep 2017 22:53:35 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/How-to-Join-entries-for-a-summary-index/m-p/368159#M108511</guid>
      <dc:creator>wayn23</dc:creator>
      <dc:date>2017-09-29T22:53:35Z</dc:date>
    </item>
  </channel>
</rss>

