Hi all,
I encountered the problem in MLTK that the data from the search is passed in multiple chunks to my custom classifier (using the apply command). Interestingly enough, the fit command passes the entire dataframe to the apply method of the custom classifier as shown below.
index=test | apply model PID 10489 2022-06-12 22:35:13,604 INFO [mlspl.ECOD Logger] [apply] Length of dataframe: 50 PID 10489 2022-06-12 22:35:13,730 INFO [mlspl.ECOD Logger] [apply] Length of dataframe: 205 PID 10489 2022-06-12 22:35:13,821 INFO [mlspl.ECOD Logger] [apply] Length of dataframe: 41
index=test | fit ECOD date_hour into model PID 8345 2022-06-12 22:27:50,867 INFO [mlspl.ECOD Logger] [apply] Length of dataframe: 296
The second one is the behavior I want since I need the data as a single batch. Setting "chunked=false" in the commands.conf to use the legacy protocol does not work because MLTK is not compatible with v1. Setting "streaming=false" also has no effect. Does anyone know how I can prevent Splunk from splitting the data in multiple chunks?
Any help is appreciated! Thanks.
Were you able to come up with a better solution for this?
index=test | collapse | apply model PID 10489 2022-06-13 19:00:02,152 INFO [mlspl.ECOD Logger] [apply] Length of dataframe: 296