<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic How to join multiple sourcetypes with additional data from a CSV using a common field (userId), then calculate the average for 2 possible fields by userId? in Getting Data In</title>
    <link>https://community.splunk.com/t5/Getting-Data-In/How-to-join-multiple-sourcetypes-with-additional-data-from-a-CSV/m-p/253501#M48686</link>
    <description>&lt;P&gt;Hi, &lt;/P&gt;

&lt;P&gt;We have 3 sourcetypes with similar data (column names are different e.g. RATE_DOWN in two of them and ACTUAL_DATA_RATE in one of them)&lt;/P&gt;

&lt;P&gt;Events are collected at the same time, but from different devices (one sourcetype per device type). And every event contains a user ID.&lt;/P&gt;

&lt;P&gt;We have an additional CSV file with list of user IDs and some additional data (rate limits etc.).&lt;/P&gt;

&lt;P&gt;I want to calculate avg &lt;STRONG&gt;RATE_DOWN&lt;/STRONG&gt; or &lt;STRONG&gt;ACTUAL_DATA_RATE_DOWN&lt;/STRONG&gt; fields per user from the CSV with additional data. I tried this:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;index=index (sourcetype=sourcetype1 OR sourcetype=sourcetype2 OR sourcetype=sourcetype3)
| join type=inner userId [ |inputcsv additional_data]
| stats avg(RATE_DOWN) avg(ACTUAL_DATA_RATE_DOWN) by userId
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;I've expected to get a list of all users from the CSV file and their stats, &lt;CODE&gt;avg(RATE_DOWN)&lt;/CODE&gt; if that field for that user exists, or else &lt;CODE&gt;avg(ACTUAL_DATA_RATE_DOWN)&lt;/CODE&gt; and the first field should be empty. But instead, I got only stats for users that have rate in ACTUAL_DATA_RATE_DOWN field ( &lt;CODE&gt;avg(ACTUAL_DATA_RATE_DOWN)&lt;/CODE&gt;).&lt;/P&gt;

&lt;P&gt;&lt;CODE&gt;avg(RATE_DOWN)&lt;/CODE&gt; is always empty, and if I remove sourcetypes leaving just one with RATE_DOWN field, stats are calculated.&lt;/P&gt;</description>
    <pubDate>Tue, 29 Sep 2020 08:35:08 GMT</pubDate>
    <dc:creator>ivanlesk</dc:creator>
    <dc:date>2020-09-29T08:35:08Z</dc:date>
    <item>
      <title>How to join multiple sourcetypes with additional data from a CSV using a common field (userId), then calculate the average for 2 possible fields by userId?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/How-to-join-multiple-sourcetypes-with-additional-data-from-a-CSV/m-p/253501#M48686</link>
      <description>&lt;P&gt;Hi, &lt;/P&gt;

&lt;P&gt;We have 3 sourcetypes with similar data (column names are different e.g. RATE_DOWN in two of them and ACTUAL_DATA_RATE in one of them)&lt;/P&gt;

&lt;P&gt;Events are collected at the same time, but from different devices (one sourcetype per device type). And every event contains a user ID.&lt;/P&gt;

&lt;P&gt;We have an additional CSV file with list of user IDs and some additional data (rate limits etc.).&lt;/P&gt;

&lt;P&gt;I want to calculate avg &lt;STRONG&gt;RATE_DOWN&lt;/STRONG&gt; or &lt;STRONG&gt;ACTUAL_DATA_RATE_DOWN&lt;/STRONG&gt; fields per user from the CSV with additional data. I tried this:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;index=index (sourcetype=sourcetype1 OR sourcetype=sourcetype2 OR sourcetype=sourcetype3)
| join type=inner userId [ |inputcsv additional_data]
| stats avg(RATE_DOWN) avg(ACTUAL_DATA_RATE_DOWN) by userId
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;I've expected to get a list of all users from the CSV file and their stats, &lt;CODE&gt;avg(RATE_DOWN)&lt;/CODE&gt; if that field for that user exists, or else &lt;CODE&gt;avg(ACTUAL_DATA_RATE_DOWN)&lt;/CODE&gt; and the first field should be empty. But instead, I got only stats for users that have rate in ACTUAL_DATA_RATE_DOWN field ( &lt;CODE&gt;avg(ACTUAL_DATA_RATE_DOWN)&lt;/CODE&gt;).&lt;/P&gt;

&lt;P&gt;&lt;CODE&gt;avg(RATE_DOWN)&lt;/CODE&gt; is always empty, and if I remove sourcetypes leaving just one with RATE_DOWN field, stats are calculated.&lt;/P&gt;</description>
      <pubDate>Tue, 29 Sep 2020 08:35:08 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/How-to-join-multiple-sourcetypes-with-additional-data-from-a-CSV/m-p/253501#M48686</guid>
      <dc:creator>ivanlesk</dc:creator>
      <dc:date>2020-09-29T08:35:08Z</dc:date>
    </item>
    <item>
      <title>Re: How to join multiple sourcetypes with additional data from a CSV using a common field (userId), then calculate the average for 2 possible fields by userId?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/How-to-join-multiple-sourcetypes-with-additional-data-from-a-CSV/m-p/253502#M48687</link>
      <description>&lt;P&gt;Try this.&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt; index=index (sourcetype=sourcetype1 OR sourcetype=sourcetype2 OR sourcetype=sourcetype3)
| join type=inner userId [ |inputcsv additional_data]
| eval rate_down = coalesce(RATE_DOWN, ACTUAL_DATA_RATE_DOWN)
| stats avg(rate_down) by userId
&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Mon, 01 Feb 2016 15:51:05 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/How-to-join-multiple-sourcetypes-with-additional-data-from-a-CSV/m-p/253502#M48687</guid>
      <dc:creator>richgalloway</dc:creator>
      <dc:date>2016-02-01T15:51:05Z</dc:date>
    </item>
    <item>
      <title>Re: How to join multiple sourcetypes with additional data from a CSV using a common field (userId), then calculate the average for 2 possible fields by userId?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/How-to-join-multiple-sourcetypes-with-additional-data-from-a-CSV/m-p/253503#M48688</link>
      <description>&lt;P&gt;thx, but no help.&lt;/P&gt;

&lt;P&gt;again only stats  from sourcetype3 are shown (IDs for other users are shown but no stats). If I remove sourcetype3 from first line than results for those users are shown.&lt;/P&gt;</description>
      <pubDate>Mon, 01 Feb 2016 15:59:52 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/How-to-join-multiple-sourcetypes-with-additional-data-from-a-CSV/m-p/253503#M48688</guid>
      <dc:creator>ivanlesk</dc:creator>
      <dc:date>2016-02-01T15:59:52Z</dc:date>
    </item>
  </channel>
</rss>

