Splunk Search

How to prevent Splunk from streaming / chunking data to custom search commands?

pmunaret
Explorer

Hi all,

I encountered the problem in MLTK that the data from the search is passed in multiple chunks to my custom classifier (using the apply command). Interestingly enough, the fit command passes the entire dataframe to the apply method of the custom classifier as shown below.

Search with apply
index=test
| apply model

PID 10489 2022-06-12 22:35:13,604 INFO [mlspl.ECOD Logger] [apply] Length of dataframe: 50
PID 10489 2022-06-12 22:35:13,730 INFO [mlspl.ECOD Logger] [apply] Length of dataframe: 205
PID 10489 2022-06-12 22:35:13,821 INFO [mlspl.ECOD Logger] [apply] Length of dataframe: 41
Search with fit command
index=test
| fit ECOD date_hour into model

PID 8345 2022-06-12 22:27:50,867 INFO [mlspl.ECOD Logger] [apply] Length of dataframe: 296

The second one is the behavior I want since I need the data as a single batch. Setting "chunked=false" in the commands.conf to use the legacy protocol does not work because MLTK is not compatible with v1. Setting "streaming=false"  also has no effect. Does anyone know how I can prevent Splunk from splitting the data in multiple chunks?

Any help is appreciated! Thanks. 

Labels (1)
Tags (1)

rlalle
Engager

Were you able to come up with a better solution for this?

0 Karma

pmunaret
Explorer
I just tested it with the collapse command and it kinda works, however, I'm not a fan of using an internal command that is designed for debugging / testing.
 
Search with apply
index=test
| collapse
| apply model

PID 10489 2022-06-13 19:00:02,152 INFO [mlspl.ECOD Logger] [apply] Length of dataframe: 296
If anyone has a better way to do this, let me know. 🙂
Get Updates on the Splunk Community!

Announcing Scheduled Export GA for Dashboard Studio

We're excited to announce the general availability of Scheduled Export for Dashboard Studio. Starting in ...

Extending Observability Content to Splunk Cloud

Watch Now!   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to leverage ...

More Control Over Your Monitoring Costs with Archived Metrics GA in US-AWS!

What if there was a way you could keep all the metrics data you need while saving on storage costs?This is now ...