Splunk Search

What is the best way to check data coming from 4 million unique SIMs? CSV lookup or KV Store?

Builder

Hi!

Our Customer needs to check data coming from 4-5 millions unique SIM and detect SIMs not sending data recently.

Which is the best approach? I can get the SIM catalogue with a scheduled dbxquery, but better to user a csv lookup or KVStore?

Thanks for suggestions!

Marco

0 Karma
1 Solution

SplunkTrust
SplunkTrust

For large datasets you should be better off with the KV Store. CSV files get rewritten entirely on every update, the KV Store allows targeted updates.

View solution in original post

SplunkTrust
SplunkTrust

For large datasets you should be better off with the KV Store. CSV files get rewritten entirely on every update, the KV Store allows targeted updates.

View solution in original post

SplunkTrust
SplunkTrust

That's rewriting the entire collection because you're telling splunk "here's a search result, now write that into this lookup". From the search language, there is no targeted insert/update/delete - you'll need to descend into the REST API for that.

From the search language, you can only fall back to loading the entire collection and writing out the entire collection, hoping that it'll be smart enough to not actually update unchanged entries:

  base search | stats latest(_time) as last_connect latest(status) as status by SIM
| inputlookup append=t SIM-lookup 
| stats first(_*) as _* first(*) as * by SIM
| outputlookup SIM-lookup
0 Karma

Builder

Martin,
thanks for clarification. I was confused by this example in docs:

| inputlookup csvcoll_lookup | search _key=544948df3ec32d7a4c1d9755 | eval CustName="Marge Simpson" | eval CustCity="Springfield" | outputlookup csvcoll_lookup append=True

Hope thiss will have decent performances with a global SIM cathalogue of 4.5Millions SIMs and growing! This is for a big companing managing Auto insurance satellite data!

Marco

0 Karma

SplunkTrust
SplunkTrust

That might be a new feature 🙂

0 Karma

Builder

Martin,

I knew KVStore was the right answer, but how?

Here's a schema I wrote down but's not 100% working in the update part. I made some tests using the oidemo index from the oidemo app, using the mdn field as SIM id.

Here's my collection

[SIM-cathalogue]
field.SIM = string
field.last_connect = time
field.status = string
accelerated_fields.SIMaccelerated = {"SIM":1, "last_connect":1,"status":1}

with the following lookup defined:

[SIM-lookup]
collection = SIM-cathalogue
external_type = kvstore
fields_list = SIM,last_connect,status

Here are the steps I tried:

1) create master SIM repository

index=oidemo mdn=* | dedup mdn | fields mdn, _time | rename mdn as SIM|  eval status="WARN" |eval last_connect=_time |table SIM,last_connect, status| outputlookup SIM-lookup

2) update every 5 minutes the KVStore with the SIM (mdn) that sent data in the last 5m:

index=oidemo mdn=* | dedup mdn | fields mdn, _time | rename mdn as SIM  | lookup SIM-lookup SIM|eval previous_connect=last_connect | eval last_connect=_time |eval oldstatus=status|eval status="OK"| table SIM,SIMKEY,_key,previous_connect, last_connect, status | outputlookup SIM-lookup append=True

The problem is that the second search completely overwrites the whole KVStore, instead of just updating the updated entries.

Where's the error?!

Marco

0 Karma