Getting Data In

Splunk do not store data from Data input script

Ciccius
Explorer

Hi all,
I have configured a new script in 'Data inputs' to feed my index with data from a Rest API.
The script has been written in python3, do a simple request to the endpoint, gather the data and do some little manipulation of it,  and write it to the stout by the print() function.

The script is placed in the 'bin' folder of my app and using the web UI I configured it without any issue to run every hour. I tested it running manually from the command line and the output is what I expected.

In the splunkd.log I have the trace that Splunk ran it as the following:
02-19-2025 10:49:00.001 +0100 INFO ExecProcessor [3193396 ExecProcessor] - setting reschedule_ms=86399999, for command=/opt/splunk/bin/python3.7 /opt/splunk/etc/apps/adsmart_summary/bin/getCampaignData.py

... and nothing more is logged, neither errors nor anything else.

But in the index i choose in the web UI there is no data coming from the script.

Where I can start to check what is going on?

Thanks!

Labels (1)
0 Karma
1 Solution

Ciccius
Explorer
I don't understand why, but removing everything from the WebUI and manually configuring the script in inputs.conf it works, data flows into the index like a charm.

View solution in original post

0 Karma

Ciccius
Explorer
I don't understand why, but removing everything from the WebUI and manually configuring the script in inputs.conf it works, data flows into the index like a charm.
0 Karma

livehybrid
SplunkTrust
SplunkTrust

Hi @Ciccius 

I feel your frustration - I've written multiple inputs and had issues like this and it can be pain to resolve. I've always found the best place to start is with the following:

$SPLUNK_HOME/bin/splunk cmd splunkd print-modinput-config yourSchema yourStanza

If you've create a simple input then yourSchema might = yourStanza, however if you have are runnign as a single instance, but if not you might have multiple stanzas for a single instance (e.g. yourInput://stanza1 and yourInput://stanza2)

If you run the above then it should spit out the schema for your stanza. If you get any errors then you should investigate! If you get an XML output then you can try running:

$SPLUNK_HOME/bin/splunk cmd splunkd print-modinput-config yourSchema yourStanza | $SPLUNK_HOME/bin/splunk cmd python3 /opt/splunk/etc/apps/adsmart_summary/bin/getCampaignData.py

In this scenario it is invoking the modular input as it would from within Splunk as a scheduled ExecProcess. This might give you more insight into the goings-on within your input. 

I use this all the time to test inputs so I dont need to wait for the interval to pass! 

Please let me know how you get on and consider accepting this answer or adding karma this answer if it has helped.
Regards

Will

 

0 Karma

Ciccius
Explorer

Hi Will,
thanks for the hints. I didn't create a modular input, just a simple Data Inputs > Script in the Web UI, so when I try to run the command you suggested, Splunk says that "Scheme 'script' is not inizialized" (I used 'script' as scheme and script:///opt/splunk/etc/apps/adsmart_summary/bin/getCampaignData.py as stanza name as written in inputs.conf). I think it's the normal behaviour.

In metrics.log I found that at some point Splunk got some events from my script, but anything has been written in the index. As I wrote in the other post, my supects are about avg_age and max_age that have negative values:

02-19-2025 10:49:29.584 +0100 INFO Metrics - group=per_source_thruput, series="/opt/splunk/etc/apps/adsmart_summary/bin/getcampaigndata.py", kbps=0.436, eps=0.677, kb=13.525, ev=21, avg_age=-3600.000, max_age=-3600

host = splunkidx01
source = /opt/splunk/var/log/splunk/metrics.log
sourcetype = splunkd

 

Maybe there something about the timestamp of the events, I am still there trying to figure it out.

Thanks!

0 Karma

kiran_panchavat
SplunkTrust
SplunkTrust

@Ciccius 

You need to configure Data Input similar to how you would setup File Monitor, Performance Monitors etc. Splunk would need to know what to read, from where to read and how frequently to read, where to index and setting up source/sourcetype etc. These you would need to configure in inputs.conf either through Splunk Web or CLI. Refer to the documentation:

Get data from APIs and other remote data interfaces through scripted inputs - Splunk Documentation

Also read about Writing Reliable scripts documentation, as most of the time scripted inputs have a wrapper script as well as maintain your own last indexed data/recovery/parallel execution etc: https://docs.splunk.com/Documentation/Splunk/latest/AdvancedDev/ScriptSetup 

Once you have completely tested and made your scripted input robust for your scenario, you may be able to build an Add on using Splunk Add On Builder or move towards creating your Modular Input to Splunk.

https://dev.splunk.com/enterprise/docs/developapps/manageknowledge/custominputs/ 

Did this help? If yes, please consider giving kudos, marking it as the solution, or commenting for clarification — your feedback keeps the community going!
0 Karma

Ciccius
Explorer

I am pretty sure that a leave a reply to you yesterday, but I do found it. Anyway, I already tried what you suggest to do, thanks. Looking at metrics.log I found a line that I think demonstrate that splunk in some way get the data (ev=21) but discard it:

02-19-2025 10:49:29.584 +0100 INFO Metrics - group=per_source_thruput, series="/opt/splunk/etc/apps/adsmart_summary/bin/getcampaigndata.py", kbps=0.436, eps=0.677, kb=13.525, ev=21, avg_age=-3600.000, max_age=-3600

host = splunkidx01
source = /opt/splunk/var/log/splunk/metrics.log
sourcetype = splunkd


Maybe the issue is avg_age and max_age that are negative, so it is something about the timestamp the the script produce for the events.

0 Karma
Get Updates on the Splunk Community!

Automatic Discovery Part 1: What is Automatic Discovery in Splunk Observability Cloud ...

If you’ve ever deployed a new database cluster, spun up a caching layer, or added a load balancer, you know it ...

Real-Time Fraud Detection: How Splunk Dashboards Protect Financial Institutions

Financial fraud isn't slowing down. If anything, it's getting more sophisticated. Account takeovers, credit ...

Splunk + ThousandEyes: Correlate frontend, app, and network data to troubleshoot ...

 Are you tired of troubleshooting delays caused by siloed frontend, application, and network data? We've got a ...