Hello,
I have written a Python script that performs an API query from a system. This script is to be executed as scripted input at regular intervals (hourly).
Is there a possibility that the output of the script is stored in a Splunk KV store?
So far I have only managed to save the output of the scripted input in an index. However, since this is data from a database that is updated daily, I think it would make sense to use the Splunk KV Store.
Thanks in advance.
It can be done (half of ES works this way) but it's ugly. An input is what should work as... well, an input. Not as a vessel to run something that does something completely different.
So you have two options (apart from ingesting data into an index) - run a completely external tool - for example with cron - which will fiddle with splunk by API or indeed run a modular input. Both solutions are not very pretty.
Hi @MrLR_02
As @gcusello said - you'd normally put daily data like this in an index, however if you really want to write to KV store then please see below 🙂
Are you currently using the smi.eventWriter to send data to you index with a streamEvents method?
If you have the session_key within your writer function then you should be able to use the inbuilt Splunk Python SDK to communicate with the KV Store. You'll need to initiate a new client using the session_key if you havent already got one within your method.
I havent got an example to hand but I will see if i can find one however this pseudo code may help you towards a working code
import splunklib.client as client
import splunklib.modularinput as smi
# Define your modular input class
class MyModularInput(smi.Script):
def get_scheme(self):
scheme = smi.Scheme("My Modular Input")
scheme.description = "Streams data to a Splunk KV Store"
scheme.use_external_validation = False
scheme.streaming_mode = smi.Scheme.streaming_mode_simple
return scheme
def stream_events(self, inputs, ew):
# Iterate over each input stanza
for input_name, input_item in inputs.inputs.items():
# Retrieve the session key
session_key = inputs.metadata["session_key"]
# Connect to Splunk using the session key
service = client.connect(token=session_key)
# Define the KV Store collection name
collection_name = "your_kv_collection"
# Data to be written to the KV Store
data = {
"key1": "value1",
"key2": "value2"
}
# Access the KV Store collection
collection = service.kvstore[collection_name]
# Insert data into the KV Store
try:
collection.data.insert(data)
ew.log("INFO", "Data successfully written to the KV Store.")
except Exception as e:
ew.log("ERROR", f"Failed to write data to the KV Store: {e}")
# Run the modular input script
if __name__ == "__main__":
sys.exit(MyModularInput().run(sys.argv))
Please let me know how you get on and consider adding karma to this or any other answer if it has helped.
Regards
Will
Hi @MrLR_02 ,
I use kv-Store only if I have to manage records (e.g. case management), for the other situations I prefer using indexes.
Anyway, you can store data in kv-store running a scheduled search with outputlookup command at the end.
Ciao.
Giuseppe
Thanks for the answer.
Currently I also store the output in an index, but since the scripted input is executed every hour and I don't want to have the same data stored in the index several times, I empty the index completely after every hour.
But now I have the problem that it can happen that the API query does not work, which would mean that no data from the system queried via API is available in Splunk.
What solution can you recommend for this problem?
Thanks in advance.
Hi @MrLR_02 ,
what's the issue to have the same data stored in the index with different timestamps?
Forget your database approach Splunk isn't a database,
An index isn't a database table where you store only the data you're using; you can read only the last hour data from your index having the last situation, the same of your appproach deleting events.
then, in addition, if you need you can have the situation at a defined time changing the timepicker of your your search.
Ciao.
Giuseppe
Hi Giuseppe,
yes I understand how Splunk stores its data in the indexes.
But when I run the scripted input every hour, it creates 24 entries for one device entry from the target system. But with Scripted Input I'm not just getting one entry back, I could be getting 200 entries back from the target system.
And then 24 entries a day for 200 device entries is a lot, over a long period of time it takes a lot of space on the indexer.
So I want to find a way to store the data from the scripted input on the indexer, but not store too many duplicates of the same device entries.
FYI: With the Script for the Scripted Input I ask the API of an i-doit System, which is a Software for IT-Documentation to give me all of its stored device-entries.
Thanks in advance.
Hi @MrLR_02 ,
if your problem is the disk space, you cannot use the delete command because it only logically deletes events not physically.
you should use the clean command that deletes all events from the index and deletes also physically.
You can solve the space issue, applying a very short retention policy on that index (e.g. 12 hours or less), in this way you physically delete buckets.
Ciao.
Giuseppe
Hi Giuseppe,
yep, I currently empty the index for the events using a very short retention time. I think it will stay that way.
Other solutions like the KV Store don't really make much sense.
Thanks for the nice exchange.
Bye.
Hi @MrLR_02 ,
good for you, see next time!
Ciao and happy splunking
Giuseppe
P.S.: Karma Points are appreciated by all the contributors 😉