<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: What is the best way to check data coming from 4 million unique SIMs? CSV lookup or KV Store? in Splunk Search</title>
    <link>https://community.splunk.com/t5/Splunk-Search/What-is-the-best-way-to-check-data-coming-from-4-million-unique/m-p/255021#M76375</link>
    <description>&lt;P&gt;Martin,&lt;/P&gt;

&lt;P&gt;I knew KVStore was the right answer, but how?&lt;/P&gt;

&lt;P&gt;Here's a schema I wrote down  but's not 100% working in the update part. I made some tests using the oidemo index from the oidemo app, using the mdn field as SIM id.&lt;/P&gt;

&lt;P&gt;Here's my collection&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;[SIM-cathalogue]
field.SIM = string
field.last_connect = time
field.status = string
accelerated_fields.SIMaccelerated = {"SIM":1, "last_connect":1,"status":1}
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;with the following lookup defined:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;[SIM-lookup]
collection = SIM-cathalogue
external_type = kvstore
fields_list = SIM,last_connect,status
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;Here are the steps I tried:&lt;/P&gt;

&lt;P&gt;1) create master SIM repository&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;index=oidemo mdn=* | dedup mdn | fields mdn, _time | rename mdn as SIM|  eval status="WARN" |eval last_connect=_time |table SIM,last_connect, status| outputlookup SIM-lookup
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;2) update every 5 minutes the KVStore with the SIM (mdn) that sent data in the last 5m:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;index=oidemo mdn=* | dedup mdn | fields mdn, _time | rename mdn as SIM  | lookup SIM-lookup SIM|eval previous_connect=last_connect | eval last_connect=_time |eval oldstatus=status|eval status="OK"| table SIM,SIMKEY,_key,previous_connect, last_connect, status | outputlookup SIM-lookup append=True
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;The problem is that the second search completely overwrites the whole KVStore, instead of just updating the updated entries.&lt;/P&gt;

&lt;P&gt;Where's the error?!&lt;/P&gt;

&lt;P&gt;Marco&lt;/P&gt;</description>
    <pubDate>Sun, 04 Dec 2016 20:46:18 GMT</pubDate>
    <dc:creator>marcoscala</dc:creator>
    <dc:date>2016-12-04T20:46:18Z</dc:date>
    <item>
      <title>What is the best way to check data coming from 4 million unique SIMs? CSV lookup or KV Store?</title>
      <link>https://community.splunk.com/t5/Splunk-Search/What-is-the-best-way-to-check-data-coming-from-4-million-unique/m-p/255019#M76373</link>
      <description>&lt;P&gt;Hi! &lt;/P&gt;

&lt;P&gt;Our Customer needs to check data coming from 4-5 millions unique SIM and detect SIMs not sending data recently.&lt;/P&gt;

&lt;P&gt;Which is the best approach? I can get the SIM catalogue with a scheduled dbxquery, but better to user a csv lookup or KVStore?&lt;/P&gt;

&lt;P&gt;Thanks for suggestions!&lt;/P&gt;

&lt;P&gt;Marco&lt;/P&gt;</description>
      <pubDate>Sun, 04 Dec 2016 18:35:17 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/What-is-the-best-way-to-check-data-coming-from-4-million-unique/m-p/255019#M76373</guid>
      <dc:creator>marcoscala</dc:creator>
      <dc:date>2016-12-04T18:35:17Z</dc:date>
    </item>
    <item>
      <title>Re: What is the best way to check data coming from 4 million unique SIMs? CSV lookup or KV Store?</title>
      <link>https://community.splunk.com/t5/Splunk-Search/What-is-the-best-way-to-check-data-coming-from-4-million-unique/m-p/255020#M76374</link>
      <description>&lt;P&gt;For large datasets you should be better off with the KV Store. CSV files get rewritten entirely on every update, the KV Store allows targeted updates.&lt;/P&gt;</description>
      <pubDate>Sun, 04 Dec 2016 19:50:22 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/What-is-the-best-way-to-check-data-coming-from-4-million-unique/m-p/255020#M76374</guid>
      <dc:creator>martin_mueller</dc:creator>
      <dc:date>2016-12-04T19:50:22Z</dc:date>
    </item>
    <item>
      <title>Re: What is the best way to check data coming from 4 million unique SIMs? CSV lookup or KV Store?</title>
      <link>https://community.splunk.com/t5/Splunk-Search/What-is-the-best-way-to-check-data-coming-from-4-million-unique/m-p/255021#M76375</link>
      <description>&lt;P&gt;Martin,&lt;/P&gt;

&lt;P&gt;I knew KVStore was the right answer, but how?&lt;/P&gt;

&lt;P&gt;Here's a schema I wrote down  but's not 100% working in the update part. I made some tests using the oidemo index from the oidemo app, using the mdn field as SIM id.&lt;/P&gt;

&lt;P&gt;Here's my collection&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;[SIM-cathalogue]
field.SIM = string
field.last_connect = time
field.status = string
accelerated_fields.SIMaccelerated = {"SIM":1, "last_connect":1,"status":1}
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;with the following lookup defined:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;[SIM-lookup]
collection = SIM-cathalogue
external_type = kvstore
fields_list = SIM,last_connect,status
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;Here are the steps I tried:&lt;/P&gt;

&lt;P&gt;1) create master SIM repository&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;index=oidemo mdn=* | dedup mdn | fields mdn, _time | rename mdn as SIM|  eval status="WARN" |eval last_connect=_time |table SIM,last_connect, status| outputlookup SIM-lookup
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;2) update every 5 minutes the KVStore with the SIM (mdn) that sent data in the last 5m:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;index=oidemo mdn=* | dedup mdn | fields mdn, _time | rename mdn as SIM  | lookup SIM-lookup SIM|eval previous_connect=last_connect | eval last_connect=_time |eval oldstatus=status|eval status="OK"| table SIM,SIMKEY,_key,previous_connect, last_connect, status | outputlookup SIM-lookup append=True
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;The problem is that the second search completely overwrites the whole KVStore, instead of just updating the updated entries.&lt;/P&gt;

&lt;P&gt;Where's the error?!&lt;/P&gt;

&lt;P&gt;Marco&lt;/P&gt;</description>
      <pubDate>Sun, 04 Dec 2016 20:46:18 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/What-is-the-best-way-to-check-data-coming-from-4-million-unique/m-p/255021#M76375</guid>
      <dc:creator>marcoscala</dc:creator>
      <dc:date>2016-12-04T20:46:18Z</dc:date>
    </item>
    <item>
      <title>Re: What is the best way to check data coming from 4 million unique SIMs? CSV lookup or KV Store?</title>
      <link>https://community.splunk.com/t5/Splunk-Search/What-is-the-best-way-to-check-data-coming-from-4-million-unique/m-p/255022#M76376</link>
      <description>&lt;P&gt;That's rewriting the entire collection because you're telling splunk "here's a search result, now write that into this lookup". From the search language, there is no targeted insert/update/delete - you'll need to descend into the REST API for that.&lt;/P&gt;

&lt;P&gt;From the search language, you can only fall back to loading the entire collection and writing out the entire collection, hoping that it'll be smart enough to not actually update unchanged entries:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;  base search | stats latest(_time) as last_connect latest(status) as status by SIM
| inputlookup append=t SIM-lookup 
| stats first(_*) as _* first(*) as * by SIM
| outputlookup SIM-lookup
&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Sun, 04 Dec 2016 22:33:20 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/What-is-the-best-way-to-check-data-coming-from-4-million-unique/m-p/255022#M76376</guid>
      <dc:creator>martin_mueller</dc:creator>
      <dc:date>2016-12-04T22:33:20Z</dc:date>
    </item>
    <item>
      <title>Re: What is the best way to check data coming from 4 million unique SIMs? CSV lookup or KV Store?</title>
      <link>https://community.splunk.com/t5/Splunk-Search/What-is-the-best-way-to-check-data-coming-from-4-million-unique/m-p/255023#M76377</link>
      <description>&lt;P&gt;Martin,&lt;BR /&gt;
thanks for clarification. I was confused by this example in docs:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;| inputlookup csvcoll_lookup | search _key=544948df3ec32d7a4c1d9755 | eval CustName="Marge Simpson" | eval CustCity="Springfield" | outputlookup csvcoll_lookup append=True
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;Hope thiss will have decent performances with a global SIM cathalogue of 4.5Millions SIMs and growing! This is for a big companing managing Auto insurance satellite data!&lt;/P&gt;

&lt;P&gt;Marco &lt;/P&gt;</description>
      <pubDate>Mon, 05 Dec 2016 06:48:37 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/What-is-the-best-way-to-check-data-coming-from-4-million-unique/m-p/255023#M76377</guid>
      <dc:creator>marcoscala</dc:creator>
      <dc:date>2016-12-05T06:48:37Z</dc:date>
    </item>
    <item>
      <title>Re: What is the best way to check data coming from 4 million unique SIMs? CSV lookup or KV Store?</title>
      <link>https://community.splunk.com/t5/Splunk-Search/What-is-the-best-way-to-check-data-coming-from-4-million-unique/m-p/255024#M76378</link>
      <description>&lt;P&gt;That might be a new feature &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 05 Dec 2016 07:16:24 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/What-is-the-best-way-to-check-data-coming-from-4-million-unique/m-p/255024#M76378</guid>
      <dc:creator>martin_mueller</dc:creator>
      <dc:date>2016-12-05T07:16:24Z</dc:date>
    </item>
  </channel>
</rss>

