Splunk Dev

#machinelearning Why Mean on beta Distribution is negative for Density function

jasantor
Engager

#machinelearning

Hello,

I am using dist=auto in my Density function and I am getting negative Beta Results. I feel like this is wrong but keep me honest, I would like to understand how Beta distribution is captured  and why the mean is a negative result if I am using 0 to 100% success rate? other distribution I am happy with it (e.g Gaussian KDE and Normal)

|fit DensityFunction MyModelSuccessRate by "HourOfDay,Object" into MyModel2 dist="auto"

Thanks,

 

Joseph 

 

 

0 Karma

jasantor
Engager

Thank you so much for the response @tscroggins. I will validate using your math, though looking at it may suggest not be a negative number but I will definitely doublr check, I will also reach out to our support. Thank you so much.

0 Karma

tscroggins
Champion

Hi @jasantor,

The implementation is in $SPLUNK_HOME/etc/apps/Splunk_ML_Toolkit/bin/algos_support/density_function/beta_distribution.py:

1. Sample min(data.shape[0], 10000) elements from field using numpy.random.choice.
2. Normalize sample to [0..1] using (data - data.min()) / (data.max() - data.min()).
3. Fit normalized sample to Beta using scipy.stats.beta.fit.
4. If either alpha <= 0 or beta <= 0, estimate parameters using normalized sample mean and variance.

The return values for scipy.stats.beta.fit are alpha, beta, loc, and scale. MLTK's implementation of dist=beta either misinterprets or mislabels loc and scale as mean and standard deviation, respectively.

You could compute the values yourself:

| summary MyModel2
| rex field=other "Alpha: (?<alpha>[^,]+), Beta: (?<beta>.+)"
| eval mean=alpha/(alpha+beta), std=sqrt((alpha*beta)/(pow(alpha+beta,2)*(alpha+beta+1)))

However, this will give you the approximate mean and standard deviation of the normalized sample, not the original data.

The dist=beta implementation is a little over four years old now, and something tells me no one has validated it. At the risk of being overly critical, the code looks suspiciously like it was copied from Stack Overflow.

I don't have a personal Splunk support account, so I can't report the issue. If you have support, I recommend opening a support case.

Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

Why Splunk Customers Should Attend Cisco Live 2026 Las Vegas

Why Splunk Customers Should Attend Cisco Live 2026 Las Vegas     Cisco Live 2026 is almost here, and this ...

What Is the Name of the USB Key Inserted by Bob Smith? (BOTS Hint, Not the Answer)

Hello Splunkers,   So you searched, “what is the name of the usb key inserted by bob smith?”  Not gonna lie… ...

Automating Threat Operations and Threat Hunting with Recorded Future

    Automating Threat Operations and Threat Hunting with Recorded Future June 29, 2026 | Register   Is your ...