Re: Can you give me some KV Schema storage advice?

kashz · ‎03-28-2019

I have data extracted from a third-party API which is a JSON that looks something like this:

{
    key1: value1,
    key2: [{ key2.1: value2.1,
             key2.2: value 2.2,
             ... (and so on) 
            }]
    key_x: value_x
    ... (and so on)
}

I want to feed this data into a KV store in a format where I can extract the key2 values (list of dictionary) and show in visualization using SPL queries.

My question is what would be the best way to store the data in KV?
1. Just one KV store ?
2. 2 different KV stores with some foreign key reference field?
3. any other suggestions / best practices ?

skoelpin · ‎03-28-2019

What's the purpose of wanting to feed this to the kv store? What's wrong with a lookup?

kashz · ‎03-28-2019

Umm, KV STORE is a way to perform lookup too. (csv is the file based lookup appraoc) right?

So I'm adding contextualization to data already in splunk.
So say something like firewall logs in splunk contain IP, bla bla (more fields).

What I'm getting from third-party api is more information about that IP but different information that is not being recorded within the log fields.
and the reason its KVstore and not just a static lookup, the third-party-api-fetch is meant to be for the lastest data.
So I have Python Script running every day pulling in new data
& KV store operations (http://dev.splunk.com/view/webframework-developapps/SP-CAAAEZG) are easier than modifying csv files.

skoelpin · ‎03-28-2019

Have you ever used a kv-store before? They are VERY fragile

You're updating it daily, so why not use a lookup?

kashz · ‎03-28-2019

"very fragile" - please elaborate?

Could you be more specific when you say "lookup"?

skoelpin · ‎03-28-2019

Easy to corrupt or lose. Lookup means in the terms of using an external file to enrich data in Splunk. I have daily lookups updating that are 1M rows to mirror other applications data

kashz · ‎03-28-2019

So let me see if I'm understanding this correctly.

When you use "external_file" as lookup - it is a CSV file correct?

Any splunkdocs link you can provide so I'm sure I'm looking at the right thing?

and when you have these lookups being performed daily, I presume you have configured "automatic lookups" for that operation, correct?

skoelpin · ‎03-28-2019

Yes. The lookup editor TA is an excellent app to get started. As for the daily update, we have an external application that keeps track of subcodes and customer names, this changes daily and subcodes can be recycled. I wrote a scheduled search which will grab the data from that app, and do an outputlookup to overwrite the existing lookup table so it mirrors the other apps data. This new data is then added via a lookup to views which identify the subcode and match it with a customer name.

https://docs.splunk.com/Documentation/Splunk/7.2.4/Knowledge/Aboutlookupsandfieldactions

kashz · ‎03-29-2019

Following up on our conversation about lookups,

External lookups are the same thing at-least in the way that it uses a Py Script to generate the events / data (pulled from third-party-server).
My question is -> Where is it stored?
or if I'm understanding correctly, basically the script resides within Splunk and any request to it, fetches the data on demand and furnishes that request, correct? Please do correct me; If I'm wrong.

In your case, when your scheduled search runs and performs outputlookup it feed to what kind of lookup storage - { csv, kv, other} ? So you are basically running a scheduled search to run the script and pull the data and then feeding it to a lookup storage from within Splunk?

skoelpin · ‎03-29-2019

I'm referring to csv lookups which are considered knowledge objects and stored on the search head(s). There's a few ways of going about this, one method would be to ingest the data via a forwarder, then use SPL in a scheduled search to format it into a table view then use an outputlookup to write it to a new lookup file. Another method is to use the rest endpoint to send data to splunk to avoid license cost.

https://docs.splunk.com/Documentation/Splunk/7.2.4/RESTREF/RESTknowledge#data.2Flookup-table-files.2...

So you are basically running a scheduled search to run the script and pull the data and then feeding it to a lookup storage from within Splunk?

This is exactly what I'm doing

kashz · ‎03-29-2019

Ah I understand what you're doing now. But I feel that will NOT be the best solution for my application.
As my workflow needs to implemented on client sites and the best way to do that would be package it as an add-on.
Hence the Py script from within the addon data-collection that fetches and saves to KV .

The reason I chose kv over the csv lookup is as per Splunk Docs (http://dev.splunk.com/view/SP-CAAAEY7)

Clearly said,
1. The KV Store is designed for large collections, and is the easiest way to develop an application that uses key-value data.
2. The KV Store is a good solution when data requires user interaction using the REST interface and when you have a frequently-changing data set.
3. A CSV-based lookup is a good solution when the data set is small or changes infrequently, and when distributed search is required.

So,
1. I do have JSON being returned from API so works with KV implementation
2. Frequently changing -> TRUE.

Do give me your inputs on what you think? Suggestions are welcome.

Can you give me some KV Schema storage advice?

.conf24 | Registration Open!

ICYMI - Check out the latest releases of Splunk Edge Processor

Introducing the 2024 SplunkTrust!