Hi experts,
I am in early experiment journey with Splunk App for DSDL (aka DLTK) to pull in some events into Jupyter
note book by way of Option 2, i.e:
<SPL search> | fit MLTKContainer mode=stage algo=my_test * into app:my_test_data
where my_test is just cloned from barebone_template, and I want the input data file to be created with name of "my_test_data".
I ran into following error since the SPL returns 500+ events:
Upon checking mlspl.confg and fair enough max_input is set as default 100,000. However, the resulting my_test_data.csv only contains 1153 lines and excluding the header row only 1152 of events of interest.
Why don't I get 100,000 events into the csv file and it's not a disk space issue either having verified it.
More importantly, how can I get the full 100,000 events into my csv file?
Any advice is greatly appreciated.
Thanks,
MCW
Hi @MCW
1. How many events are returned by your <SPL search>?
2. Can you share the output of your <SPL search> that you used (e.g. as CSV)? I'd like to replicate your situation on my server.
3. Do you have access to the server where Splunk is running on? If yes, can you provide the output of the following two commands?
./splunk show config mlspl | grep max_inputs
./splunk btool mlspl list --debug | grep max_inputs
Without knowing any more details, my guess is that your <SPL search> returned more events than you allow in your max_inputs setting (e.g. if your search returns 200'000 events and your max_inputs=100'000). Consequently, the number of events are downsampled by DSDL/MLTK. The resulting my_test_data.csv with 1153 lines that you see within the jupyter notebook environment is exactly this sample.
Regards,
Gabriel