<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Why does our data model &amp;quot;distinct count(cookie) AS UniqueVisitor&amp;quot; take very long? in Monitoring Splunk</title>
    <link>https://community.splunk.com/t5/Monitoring-Splunk/Why-does-our-data-model-quot-distinct-count-cookie-AS/m-p/201028#M2089</link>
    <description>&lt;P&gt;"visitor-hours" and "visitors" are different. 24 visitor-hours ! = 24 visitors&lt;/P&gt;</description>
    <pubDate>Fri, 30 Oct 2015 14:27:49 GMT</pubDate>
    <dc:creator>hylam</dc:creator>
    <dc:date>2015-10-30T14:27:49Z</dc:date>
    <item>
      <title>Why does our data model "distinct count(cookie) AS UniqueVisitor" take very long?</title>
      <link>https://community.splunk.com/t5/Monitoring-Splunk/Why-does-our-data-model-quot-distinct-count-cookie-AS/m-p/201016#M2077</link>
      <description>&lt;P&gt;17 GB IIS log files, 2.5 GB 100% accelerated data model. 16 cores 8 GB RAM with 2 GB RAM free. The pivot was single-core cpu-bound. Disk activity is minimal. Any idea? Thx.&lt;/P&gt;</description>
      <pubDate>Tue, 27 Oct 2015 06:51:19 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Monitoring-Splunk/Why-does-our-data-model-quot-distinct-count-cookie-AS/m-p/201016#M2077</guid>
      <dc:creator>hylam</dc:creator>
      <dc:date>2015-10-27T06:51:19Z</dc:date>
    </item>
    <item>
      <title>Re: Why does our data model "distinct count(cookie) AS UniqueVisitor" take very long?</title>
      <link>https://community.splunk.com/t5/Monitoring-Splunk/Why-does-our-data-model-quot-distinct-count-cookie-AS/m-p/201017#M2078</link>
      <description>&lt;P&gt;I'm guessing your cookies are extremely high-cardinality, ie a large number of distinct values. Computing a &lt;CODE&gt;dc()&lt;/CODE&gt; over that is a very high-load task for a data model. It has to keep (probably) millions of different values around, and for each new value it has to check if it has seen that value before or not. That's a nightmare to compute accurately.&lt;/P&gt;</description>
      <pubDate>Tue, 27 Oct 2015 19:36:34 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Monitoring-Splunk/Why-does-our-data-model-quot-distinct-count-cookie-AS/m-p/201017#M2078</guid>
      <dc:creator>martin_mueller</dc:creator>
      <dc:date>2015-10-27T19:36:34Z</dc:date>
    </item>
    <item>
      <title>Re: Why does our data model "distinct count(cookie) AS UniqueVisitor" take very long?</title>
      <link>https://community.splunk.com/t5/Monitoring-Splunk/Why-does-our-data-model-quot-distinct-count-cookie-AS/m-p/201018#M2079</link>
      <description>&lt;P&gt;In shell script I can probably do parallel sort merge in O(n log n) time w/ linear speedup by adding CPU cores. Can I parallelize this query in Splunk? The hash of cookies should work as the key for the parallel mapreduce operation.&lt;/P&gt;</description>
      <pubDate>Tue, 27 Oct 2015 23:51:57 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Monitoring-Splunk/Why-does-our-data-model-quot-distinct-count-cookie-AS/m-p/201018#M2079</guid>
      <dc:creator>hylam</dc:creator>
      <dc:date>2015-10-27T23:51:57Z</dc:date>
    </item>
    <item>
      <title>Re: Why does our data model "distinct count(cookie) AS UniqueVisitor" take very long?</title>
      <link>https://community.splunk.com/t5/Monitoring-Splunk/Why-does-our-data-model-quot-distinct-count-cookie-AS/m-p/201019#M2080</link>
      <description>&lt;P&gt;Does this scale out as you said 3 days ago?&lt;BR /&gt;
&lt;A href="https://answers.splunk.com/answers/320179/data-model-split-row-1m-limit.html#comment-320182"&gt;https://answers.splunk.com/answers/320179/data-model-split-row-1m-limit.html#comment-320182&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 27 Oct 2015 23:58:10 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Monitoring-Splunk/Why-does-our-data-model-quot-distinct-count-cookie-AS/m-p/201019#M2080</guid>
      <dc:creator>hylam</dc:creator>
      <dc:date>2015-10-27T23:58:10Z</dc:date>
    </item>
    <item>
      <title>Re: Why does our data model "distinct count(cookie) AS UniqueVisitor" take very long?</title>
      <link>https://community.splunk.com/t5/Monitoring-Splunk/Why-does-our-data-model-quot-distinct-count-cookie-AS/m-p/201020#M2081</link>
      <description>&lt;P&gt;Yeah, this should scale out to multiple indexers and probably (didn't check myself yet) also to multiple search pipelines on one indexer (6.3 feature).&lt;/P&gt;

&lt;P&gt;Before adding tons of hardware you should spend some time (and maybe money) on figuring out if your current data model / search is the best way to answer your core question - determining the best approach, one step before figuring out if you can make the chosen approach faster.&lt;/P&gt;</description>
      <pubDate>Wed, 28 Oct 2015 00:04:04 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Monitoring-Splunk/Why-does-our-data-model-quot-distinct-count-cookie-AS/m-p/201020#M2081</guid>
      <dc:creator>martin_mueller</dc:creator>
      <dc:date>2015-10-28T00:04:04Z</dc:date>
    </item>
    <item>
      <title>Re: Why does our data model "distinct count(cookie) AS UniqueVisitor" take very long?</title>
      <link>https://community.splunk.com/t5/Monitoring-Splunk/Why-does-our-data-model-quot-distinct-count-cookie-AS/m-p/201021#M2082</link>
      <description>&lt;P&gt;It is running single core cpu bound on splunk 6.3. I am using the auto (750 MB) bucket size. I think inter-bucket parallelization is possible. Not the 10 GB auto_high_volume bucket size. Simply adding the dc(cookie) hourly would count a 24 hour long session 24 times in the worst case. How can I run a parallel sort merge uniq in splunk?&lt;/P&gt;</description>
      <pubDate>Tue, 29 Sep 2020 07:43:29 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Monitoring-Splunk/Why-does-our-data-model-quot-distinct-count-cookie-AS/m-p/201021#M2082</guid>
      <dc:creator>hylam</dc:creator>
      <dc:date>2020-09-29T07:43:29Z</dc:date>
    </item>
    <item>
      <title>Re: Why does our data model "distinct count(cookie) AS UniqueVisitor" take very long?</title>
      <link>https://community.splunk.com/t5/Monitoring-Splunk/Why-does-our-data-model-quot-distinct-count-cookie-AS/m-p/201022#M2083</link>
      <description>&lt;P&gt;This is your starting point for parallelizing things on a single box: &lt;A href="http://docs.splunk.com/Documentation/Splunk/6.3.0/Capacity/Parallelization"&gt;http://docs.splunk.com/Documentation/Splunk/6.3.0/Capacity/Parallelization&lt;/A&gt;&lt;/P&gt;

&lt;P&gt;Your described hourly distinct count would be something like &lt;CODE&gt;... | timechart span=1h dc(cookie)&lt;/CODE&gt; or a row split by time in pivot/datamodel terms.&lt;/P&gt;</description>
      <pubDate>Wed, 28 Oct 2015 00:21:36 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Monitoring-Splunk/Why-does-our-data-model-quot-distinct-count-cookie-AS/m-p/201022#M2083</guid>
      <dc:creator>martin_mueller</dc:creator>
      <dc:date>2015-10-28T00:21:36Z</dc:date>
    </item>
    <item>
      <title>Re: Why does our data model "distinct count(cookie) AS UniqueVisitor" take very long?</title>
      <link>https://community.splunk.com/t5/Monitoring-Splunk/Why-does-our-data-model-quot-distinct-count-cookie-AS/m-p/201023#M2084</link>
      <description>&lt;P&gt;The cookies have a 1 year life time. If I set the query time range to 1 year it should saturate parallel sort merge uniq on any cluster that I can afford. Even C++ may not be fast enough.&lt;/P&gt;

&lt;P&gt;I have seen some splunk apps that appends new distinct keys to a csv file every 5 to 10 min. How can I write that?&lt;/P&gt;</description>
      <pubDate>Wed, 28 Oct 2015 00:31:38 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Monitoring-Splunk/Why-does-our-data-model-quot-distinct-count-cookie-AS/m-p/201023#M2084</guid>
      <dc:creator>hylam</dc:creator>
      <dc:date>2015-10-28T00:31:38Z</dc:date>
    </item>
    <item>
      <title>Re: Why does our data model "distinct count(cookie) AS UniqueVisitor" take very long?</title>
      <link>https://community.splunk.com/t5/Monitoring-Splunk/Why-does-our-data-model-quot-distinct-count-cookie-AS/m-p/201024#M2085</link>
      <description>&lt;P&gt;Yeah, doing one year of cookies &lt;EM&gt;accurately&lt;/EM&gt; in one go is nigh-on impossible, even nlog(n) becomes less than fun.&lt;/P&gt;

&lt;P&gt;Here's a traditional approach of keeping state in CSV files: &lt;A href="http://blogs.splunk.com/2011/01/11/maintaining-state-of-the-union/"&gt;http://blogs.splunk.com/2011/01/11/maintaining-state-of-the-union/&lt;/A&gt;&lt;/P&gt;

&lt;P&gt;Depending on your actual use case, you may be better off with precomputing chunks of distinct counts and storing them in a summary. Summing that up will give you higher numbers than reality, but if you're only looking for trends then that'd be fine. If you need more accurate numbers you could compare the chunked numbers with real numbers computed for a longer but still manageable time range, and apply that correction to your chunked data from then on.&lt;/P&gt;</description>
      <pubDate>Wed, 28 Oct 2015 18:00:03 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Monitoring-Splunk/Why-does-our-data-model-quot-distinct-count-cookie-AS/m-p/201024#M2085</guid>
      <dc:creator>martin_mueller</dc:creator>
      <dc:date>2015-10-28T18:00:03Z</dc:date>
    </item>
    <item>
      <title>Re: Why does our data model "distinct count(cookie) AS UniqueVisitor" take very long?</title>
      <link>https://community.splunk.com/t5/Monitoring-Splunk/Why-does-our-data-model-quot-distinct-count-cookie-AS/m-p/201025#M2086</link>
      <description>&lt;UL&gt;
&lt;LI&gt;&lt;P&gt;using accelerated data model directly&lt;BR /&gt;
dc(cookie) on 1 sec of data - instant&lt;BR /&gt;
dc(cookie) on 1 hr of data - 3 min&lt;BR /&gt;
dc(cookie) on 1.5 hr of data - i ran out of patience&lt;BR /&gt;
dc(cookie) all time - didnt even attempt&lt;/P&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;P&gt;computing hour,cookie,count then dedup&lt;BR /&gt;
dc(cookie) on all time - i ran out of patience&lt;/P&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;P&gt;computing minute,cookie,count then dedup consecutive=true&lt;BR /&gt;
dc(cookie) on all time - 7 minutes&lt;/P&gt;&lt;/LI&gt;
&lt;/UL&gt;

&lt;P&gt;all time = 14 days&lt;/P&gt;

&lt;P&gt;What kind of data structures are these?&lt;BR /&gt;
accelerated data model&lt;BR /&gt;
tsidx&lt;/P&gt;

&lt;P&gt;How can I do a quick estimate?&lt;BR /&gt;
&lt;A href="https://www.google.com/search?q=distinct+value+estimation"&gt;https://www.google.com/search?q=distinct+value+estimation&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 29 Oct 2015 15:01:19 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Monitoring-Splunk/Why-does-our-data-model-quot-distinct-count-cookie-AS/m-p/201025#M2086</guid>
      <dc:creator>hylam</dc:creator>
      <dc:date>2015-10-29T15:01:19Z</dc:date>
    </item>
    <item>
      <title>Re: Why does our data model "distinct count(cookie) AS UniqueVisitor" take very long?</title>
      <link>https://community.splunk.com/t5/Monitoring-Splunk/Why-does-our-data-model-quot-distinct-count-cookie-AS/m-p/201026#M2087</link>
      <description>&lt;P&gt;Will it be any faster if I use minimal number of fields in a data model?&lt;/P&gt;</description>
      <pubDate>Fri, 30 Oct 2015 02:52:38 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Monitoring-Splunk/Why-does-our-data-model-quot-distinct-count-cookie-AS/m-p/201026#M2087</guid>
      <dc:creator>hylam</dc:creator>
      <dc:date>2015-10-30T02:52:38Z</dc:date>
    </item>
    <item>
      <title>Re: Why does our data model "distinct count(cookie) AS UniqueVisitor" take very long?</title>
      <link>https://community.splunk.com/t5/Monitoring-Splunk/Why-does-our-data-model-quot-distinct-count-cookie-AS/m-p/201027#M2088</link>
      <description>&lt;P&gt;Yes, in principle smaller will always be faster.&lt;/P&gt;

&lt;P&gt;However, in this case you won't see large gains - the issue is cardinality, not data model size.&lt;BR /&gt;
This should imho be solved through dropping the accuracy requirement and computing chunks, e.g. build a distinct count each hour and store that in a summary, letting your reports read the summaries only. You can attempt to improve accuracy by estimating how many duplicate cookies you can expect.&lt;/P&gt;</description>
      <pubDate>Fri, 30 Oct 2015 12:54:18 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Monitoring-Splunk/Why-does-our-data-model-quot-distinct-count-cookie-AS/m-p/201027#M2088</guid>
      <dc:creator>martin_mueller</dc:creator>
      <dc:date>2015-10-30T12:54:18Z</dc:date>
    </item>
    <item>
      <title>Re: Why does our data model "distinct count(cookie) AS UniqueVisitor" take very long?</title>
      <link>https://community.splunk.com/t5/Monitoring-Splunk/Why-does-our-data-model-quot-distinct-count-cookie-AS/m-p/201028#M2089</link>
      <description>&lt;P&gt;"visitor-hours" and "visitors" are different. 24 visitor-hours ! = 24 visitors&lt;/P&gt;</description>
      <pubDate>Fri, 30 Oct 2015 14:27:49 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Monitoring-Splunk/Why-does-our-data-model-quot-distinct-count-cookie-AS/m-p/201028#M2089</guid>
      <dc:creator>hylam</dc:creator>
      <dc:date>2015-10-30T14:27:49Z</dc:date>
    </item>
    <item>
      <title>Re: Why does our data model "distinct count(cookie) AS UniqueVisitor" take very long?</title>
      <link>https://community.splunk.com/t5/Monitoring-Splunk/Why-does-our-data-model-quot-distinct-count-cookie-AS/m-p/201029#M2090</link>
      <description>&lt;P&gt;dc(cookie)  from indexed data also took 7 min. &lt;/P&gt;</description>
      <pubDate>Fri, 30 Oct 2015 15:16:16 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Monitoring-Splunk/Why-does-our-data-model-quot-distinct-count-cookie-AS/m-p/201029#M2090</guid>
      <dc:creator>hylam</dc:creator>
      <dc:date>2015-10-30T15:16:16Z</dc:date>
    </item>
    <item>
      <title>Re: Why does our data model "distinct count(cookie) AS UniqueVisitor" take very long?</title>
      <link>https://community.splunk.com/t5/Monitoring-Splunk/Why-does-our-data-model-quot-distinct-count-cookie-AS/m-p/201030#M2091</link>
      <description>&lt;P&gt;I know, that's why you'd need to calculate a rough conversion factor - and chunked data would be most useful for trends, not for precise absolute numbers. In return it'd be much much cheaper to compute.&lt;/P&gt;</description>
      <pubDate>Fri, 30 Oct 2015 16:05:14 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Monitoring-Splunk/Why-does-our-data-model-quot-distinct-count-cookie-AS/m-p/201030#M2091</guid>
      <dc:creator>martin_mueller</dc:creator>
      <dc:date>2015-10-30T16:05:14Z</dc:date>
    </item>
    <item>
      <title>Re: Why does our data model "distinct count(cookie) AS UniqueVisitor" take very long?</title>
      <link>https://community.splunk.com/t5/Monitoring-Splunk/Why-does-our-data-model-quot-distinct-count-cookie-AS/m-p/201031#M2092</link>
      <description>&lt;P&gt;Instead of chunking, you could sample - e.g. skip the last few chars of your cookie and calculate a conversion factor from this lower number to the real number. With every hex char lost you reduce cardinality by *16.&lt;/P&gt;</description>
      <pubDate>Fri, 30 Oct 2015 16:06:17 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Monitoring-Splunk/Why-does-our-data-model-quot-distinct-count-cookie-AS/m-p/201031#M2092</guid>
      <dc:creator>martin_mueller</dc:creator>
      <dc:date>2015-10-30T16:06:17Z</dc:date>
    </item>
    <item>
      <title>Re: Why does our data model "distinct count(cookie) AS UniqueVisitor" take very long?</title>
      <link>https://community.splunk.com/t5/Monitoring-Splunk/Why-does-our-data-model-quot-distinct-count-cookie-AS/m-p/201032#M2093</link>
      <description>&lt;P&gt;will a summary index holding these work? i will then rollup&lt;BR /&gt;
minute sum, hour sum, day sum, week sum, month sum, year sum &lt;/P&gt;</description>
      <pubDate>Fri, 30 Oct 2015 16:16:18 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Monitoring-Splunk/Why-does-our-data-model-quot-distinct-count-cookie-AS/m-p/201032#M2093</guid>
      <dc:creator>hylam</dc:creator>
      <dc:date>2015-10-30T16:16:18Z</dc:date>
    </item>
    <item>
      <title>Re: Why does our data model "distinct count(cookie) AS UniqueVisitor" take very long?</title>
      <link>https://community.splunk.com/t5/Monitoring-Splunk/Why-does-our-data-model-quot-distinct-count-cookie-AS/m-p/201033#M2094</link>
      <description>&lt;P&gt;How about keeping a slowly changing dimension csv?&lt;/P&gt;

&lt;P&gt;firstSeenTime,lastSeenTime,cookie&lt;/P&gt;

&lt;P&gt;&lt;A href="http://blogs.splunk.com/2011/01/11/maintaining-state-of-the-union/"&gt;http://blogs.splunk.com/2011/01/11/maintaining-state-of-the-union/&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Sat, 31 Oct 2015 10:32:00 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Monitoring-Splunk/Why-does-our-data-model-quot-distinct-count-cookie-AS/m-p/201033#M2094</guid>
      <dc:creator>hylam</dc:creator>
      <dc:date>2015-10-31T10:32:00Z</dc:date>
    </item>
    <item>
      <title>Re: Why does our data model "distinct count(cookie) AS UniqueVisitor" take very long?</title>
      <link>https://community.splunk.com/t5/Monitoring-Splunk/Why-does-our-data-model-quot-distinct-count-cookie-AS/m-p/201034#M2095</link>
      <description>&lt;P&gt;It'll work, it just depends on what your actual requirements are. All we're doing here is throwing technical solutions around, it's impossible to tell what works best for your use case without knowing your use case.&lt;/P&gt;</description>
      <pubDate>Sun, 01 Nov 2015 17:17:01 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Monitoring-Splunk/Why-does-our-data-model-quot-distinct-count-cookie-AS/m-p/201034#M2095</guid>
      <dc:creator>martin_mueller</dc:creator>
      <dc:date>2015-11-01T17:17:01Z</dc:date>
    </item>
    <item>
      <title>Re: Why does our data model "distinct count(cookie) AS UniqueVisitor" take very long?</title>
      <link>https://community.splunk.com/t5/Monitoring-Splunk/Why-does-our-data-model-quot-distinct-count-cookie-AS/m-p/201035#M2096</link>
      <description>&lt;P&gt;The slowly changing dimension should not work under a lot of cases.  &lt;/P&gt;</description>
      <pubDate>Sun, 01 Nov 2015 17:22:10 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Monitoring-Splunk/Why-does-our-data-model-quot-distinct-count-cookie-AS/m-p/201035#M2096</guid>
      <dc:creator>hylam</dc:creator>
      <dc:date>2015-11-01T17:22:10Z</dc:date>
    </item>
  </channel>
</rss>

