Splunk Search

FieldSelector - p-values higher than >.05 in selected fields

jpawloski
Path Finder

I've recently begun exploring the FieldSelector command to better understand what fields are the best predictor for an ML model. During my research, I've gained what I think to be a decent understanding of what constitutes a good predictor field based largely on its p-value (anything below .05), and the score values (the higher the better).

I've been running through some tests and noticed that the fields being selected by the FieldSelector don't represent what I would think to be the most optimal selection of fields. I've pasted the fit command I'm using below:

|fit FieldSelector num from PC_* value_hashed_* type=numeric mode=k_best param=10 into combined_field_selector

 

Once this is run, I compare the output to the summary of the combined_field_selector model, which provides score and p-values for all the fields:

| summary combined_field_selector

 

One of the ten fields selected via FieldSelector was PC_2, with a score of .3293 and a p-value of .5661. Of the 132 fields passed to this fit command, PC_2 ranked 115th in score and was the 15th highest p-value. This seems to tell me it was not a good predictor for the model. Plus, I had more than ten fields with better score/p-value combinations.

I know this type of question falls in no man's land between the underlying python, statistical algorithms, and Splunk, but Splunk is really my only means of applying ML to this data and troubleshooting the results. I'm hoping someone has a better understanding of what's going on and can potentially explain why these fields are being selected.

Labels (2)
Tags (2)
0 Karma
Get Updates on the Splunk Community!

Data Management Digest – December 2025

Welcome to the December edition of Data Management Digest! As we continue our journey of data innovation, the ...

Index This | What is broken 80% of the time by February?

December 2025 Edition   Hayyy Splunk Education Enthusiasts and the Eternally Curious!    We’re back with this ...

Unlock Faster Time-to-Value on Edge and Ingest Processor with New SPL2 Pipeline ...

Hello Splunk Community,   We're thrilled to share an exciting update that will help you manage your data more ...