Getting Data In

Scheduled data input script, no data found

jbmitchell
Loves-to-Learn Lots

I have created a data input to run a wrapper script, which executes a python script, and gather its output. It  was working as expected during my initial tests, but seems to have failed when going for the full desired effect.

My test script output 1 json record and worked correctly showing the events in Splunk as expected. When I got that working, I changed the command in the wrapper script to run the desired python script instead of the testing script. On a Friday, I scheduled it through the Splunk data input dashboard for the next day, Saturday at 8 am, using the cron syntax " 0 8 * * 6".

When I checked the index this Monday morning, there was no new data, only that which was output by my initial testing script. My first concern was whether or not the script actually ran/executed.  Is there a way I can check to verify this? Or a way to check for data input script errors in general?

Another concern of mine was the volume of data, as the test script only output 1 record and the real script should output well over 1 million records, sometimes 10x that amount. I should also add that this script makes a large amount of API calls, and I expect it should take several hours to complete. Are there any limitations in Splunk, or perhaps the server in general, that could have caused failure just due to the huge volume of records? Or perhaps the time it takes the script to finish?

 

Thank you.

Labels (1)
0 Karma

jbmitchell
Loves-to-Learn Lots

Thank you for the response @gcusello 

So the script is running, and I was storing all of the records in a list which I believe might have caused a problem due to the number of records. I changed it so that the records are printed to std out for each API call which resulted in output that Splunk picked up and input into the right index. However, there were far less events in Splunk than records from the API.

Based on first glance, there appears to be many records that were grouped together into a single event that we want separated. There also appears to be incomplete records that are cut off.

For the ones that are grouped together, I have added a new line print statement between each record to hopefully allow Splunk to identify the separation. Are there any other methods to set 1 json record for each Splunk event?

For the records that are cut off, I am not sure how to approach this. I am hoping that separation of the events will solve this. Are there any output size limits or time limits on Splunk script data inputs? Or any other possible issues that may have caused this?

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @jbmitchell,

about the approach to pre parse your data I'm not able to help you because I haven't a large experience in scripting.

About the limits, see in limits.conf (https://docs.splunk.com/Documentation/Splunk/latest/Admin/Limitsconf).

Ciao.

Giuseppe

0 Karma

jbmitchell
Loves-to-Learn Lots

Thanks @gcusello.

I was able to separate the records into single events, and got much better results.  However, there are about 800k events that were ingested into the Splunk index, but I was expecting around 5 million.

I went through the documentation you linked for limits.conf a couple of times, but I did not see configurations specific to data inputs. Are there any other possibilities or options that could address the lack of events?

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @jbmitchell,

limits.conf, as you can understand by its name, is the location of all the Splunk parameters to limit jobs.

If the issue that you have less events is continous (in all events), maybe you could think to pre parse your data before ingesting in Splunk.

We do it something.

Ciao.

Giuseppe

0 Karma

gcusello
SplunkTrust
SplunkTrust

HGi @jbmitchell,

at first you should be sure that the script is executed: you could modify your script to create a log file.

Anyway, maybe you tested the script with a different user than the one that runs Splunk.

So try to give execution grants also to others.

About the second problem, you should debug it in your real environment. 

Ciao.

Giuseppe

0 Karma
Get Updates on the Splunk Community!

Extending Observability Content to Splunk Cloud

Watch Now!   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to leverage ...

More Control Over Your Monitoring Costs with Archived Metrics!

What if there was a way you could keep all the metrics data you need while saving on storage costs?This is now ...

New in Observability Cloud - Explicit Bucket Histograms

Splunk introduces native support for histograms as a metric data type within Observability Cloud with Explicit ...