<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Load csv from GCP into a KVStore lookup using the Python SDK in Splunk Dev</title>
    <link>https://community.splunk.com/t5/Splunk-Dev/Load-csv-from-GCP-into-a-KVStore-lookup-using-the-Python-SDK/m-p/491252#M8848</link>
    <description>&lt;P&gt;We currently have a 45mb csv file that we're going to be loading into a Splunk kvstore. I want to be able to accomplish this via the python SDK but I'm running into a bit of trouble loading the records.&lt;/P&gt;
&lt;P&gt;The only way I can find to update a kvstore is the service.collection.insert() function which as far as I can tell only accepts 1 row at a time. Being that we have 250k rows in this file, I can't afford to wait for all lines to upload every day.&lt;/P&gt;
&lt;P&gt;This is what I have so far:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE&gt;from splunklib import client, binding
import json, pandas as pd
from copy import deepcopy

data_file = '/path/to/file.csv'

username = 'user'
password = 'splunk_pass'
connectionHandler = binding.handler(timeout=12400)
connect_kwargs = {
    'host': 'splunk-host.com',
    'port': 8089,
    'username': username,
    'password': password,
    'scheme': 'https',
    'autologin': True,
    'handler': connectionHandler
}
flag = True
while flag:
    try:
        service = client.connect(**connect_kwargs)
        service.namespace['owner'] = 'Nobody'
        flag = False
    except binding.HTTPError:
        print('Splunk 504 Error')

kv = service.kvstore
kv['test_data'].delete()
df = pd.read_csv(data_file)
df.replace(pd.np.nan, '', regex=True)
df['_key'] = df['key_field']
result = df.to_dict(orient='records')
fields = deepcopy(result[0])
for field in fields.keys():
    fields[field] = type(fields[field]).__name__
df = df.astype(fields)
kv.create(name='test_data', fields=fields, owner='nobody', sharing='system')
for row in result:
    row = json.dumps(row)
    row.replace("nan", "'nan'")
    kv['learning_center'].data.insert(row)
transforms = service.confs['transforms']
transforms.create(name='learning_center_lookup', **{'external_type': 'kvstore', 'collection': 'learning_center', 'fields_list': '_key, userGuid', 'owner': 'nobody'})
# transforms['learning_center_lookup'].delete()
collection = service.kvstore['learning-center']
print(collection.data.query())
&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;In addition to the problem of taking forever to load a quarter million records, it keeps failing on a row with nan as the value, and no matter what I put in there to try to deal with the nan, it persists in the dictionary value.&lt;/P&gt;</description>
    <pubDate>Sun, 07 Jun 2020 18:31:53 GMT</pubDate>
    <dc:creator>cdhippen</dc:creator>
    <dc:date>2020-06-07T18:31:53Z</dc:date>
    <item>
      <title>Load csv from GCP into a KVStore lookup using the Python SDK</title>
      <link>https://community.splunk.com/t5/Splunk-Dev/Load-csv-from-GCP-into-a-KVStore-lookup-using-the-Python-SDK/m-p/491252#M8848</link>
      <description>&lt;P&gt;We currently have a 45mb csv file that we're going to be loading into a Splunk kvstore. I want to be able to accomplish this via the python SDK but I'm running into a bit of trouble loading the records.&lt;/P&gt;
&lt;P&gt;The only way I can find to update a kvstore is the service.collection.insert() function which as far as I can tell only accepts 1 row at a time. Being that we have 250k rows in this file, I can't afford to wait for all lines to upload every day.&lt;/P&gt;
&lt;P&gt;This is what I have so far:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE&gt;from splunklib import client, binding
import json, pandas as pd
from copy import deepcopy

data_file = '/path/to/file.csv'

username = 'user'
password = 'splunk_pass'
connectionHandler = binding.handler(timeout=12400)
connect_kwargs = {
    'host': 'splunk-host.com',
    'port': 8089,
    'username': username,
    'password': password,
    'scheme': 'https',
    'autologin': True,
    'handler': connectionHandler
}
flag = True
while flag:
    try:
        service = client.connect(**connect_kwargs)
        service.namespace['owner'] = 'Nobody'
        flag = False
    except binding.HTTPError:
        print('Splunk 504 Error')

kv = service.kvstore
kv['test_data'].delete()
df = pd.read_csv(data_file)
df.replace(pd.np.nan, '', regex=True)
df['_key'] = df['key_field']
result = df.to_dict(orient='records')
fields = deepcopy(result[0])
for field in fields.keys():
    fields[field] = type(fields[field]).__name__
df = df.astype(fields)
kv.create(name='test_data', fields=fields, owner='nobody', sharing='system')
for row in result:
    row = json.dumps(row)
    row.replace("nan", "'nan'")
    kv['learning_center'].data.insert(row)
transforms = service.confs['transforms']
transforms.create(name='learning_center_lookup', **{'external_type': 'kvstore', 'collection': 'learning_center', 'fields_list': '_key, userGuid', 'owner': 'nobody'})
# transforms['learning_center_lookup'].delete()
collection = service.kvstore['learning-center']
print(collection.data.query())
&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;In addition to the problem of taking forever to load a quarter million records, it keeps failing on a row with nan as the value, and no matter what I put in there to try to deal with the nan, it persists in the dictionary value.&lt;/P&gt;</description>
      <pubDate>Sun, 07 Jun 2020 18:31:53 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Dev/Load-csv-from-GCP-into-a-KVStore-lookup-using-the-Python-SDK/m-p/491252#M8848</guid>
      <dc:creator>cdhippen</dc:creator>
      <dc:date>2020-06-07T18:31:53Z</dc:date>
    </item>
    <item>
      <title>Re: Load csv from GCP into a KVStore lookup using the Python SDK</title>
      <link>https://community.splunk.com/t5/Splunk-Dev/Load-csv-from-GCP-into-a-KVStore-lookup-using-the-Python-SDK/m-p/491253#M8849</link>
      <description>&lt;P&gt;better to batch post the data&lt;BR /&gt;
&lt;A href="https://github.com/georgestarcher/TA-SyncKVStore/blob/master/bin/input_module_kvstore_to_kvstore.py"&gt;https://github.com/georgestarcher/TA-SyncKVStore/blob/master/bin/input_module_kvstore_to_kvstore.py&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 27 Jan 2020 19:11:55 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Dev/Load-csv-from-GCP-into-a-KVStore-lookup-using-the-Python-SDK/m-p/491253#M8849</guid>
      <dc:creator>starcher</dc:creator>
      <dc:date>2020-01-27T19:11:55Z</dc:date>
    </item>
    <item>
      <title>Re: Load csv from GCP into a KVStore lookup using the Python SDK</title>
      <link>https://community.splunk.com/t5/Splunk-Dev/Load-csv-from-GCP-into-a-KVStore-lookup-using-the-Python-SDK/m-p/491254#M8850</link>
      <description>&lt;P&gt;Is there no way to do this with the Splunk Python SDK?&lt;/P&gt;</description>
      <pubDate>Mon, 27 Jan 2020 19:15:51 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Dev/Load-csv-from-GCP-into-a-KVStore-lookup-using-the-Python-SDK/m-p/491254#M8850</guid>
      <dc:creator>cdhippen</dc:creator>
      <dc:date>2020-01-27T19:15:51Z</dc:date>
    </item>
  </channel>
</rss>

