All Apps and Add-ons

which of the following is good way for predicting any field?

nasrinmulani
New Member

Hi All,

I am working on prediction of start time of job and i have scheduled time as a independent variable.

  • Approach 1:
    I am thinking to convert the H:M:S time of start time and scheduled time into seconds and them predict the start time in seconds using independent variable as schedule time in seconds and hour if the schedule time.and convert it again into H:M:S and append it with the respective date

  • Approach 2:
    Another approach can be convert the start time and scheduled Time into epoch. Get the difference between them, predict that difference using independent variable as schedule time in epoch and hour of the schedule time, type of the job

Please let me know which approach is better and algorithm - RandomForestRegressor algorithm is feasible here,

Thanks in Advance !

0 Karma
1 Solution

aoliner_splunk
Splunk Employee
Splunk Employee

This questions is impossible to answer well without knowing more about the data, but here are a few suggestions based on what you've provided:

  1. Predict delay (the difference between scheduled and start time) rather than start time.
  2. Use derived features of the scheduled time (like hour or day of the week) in addition to the epoch time.
  3. Try different algorithms, including random forest, and see which works best. If you stick with the defaults in the Toolkit, you only need to run the assistant a handful of times.
  4. Think about what you want from this model. If minimal RMSE is your goal, #3 is sufficient. If you want an interpretable model that tells you what features are important, for example, some models are better choices than others (models that support the summary command will be automatically summarized at the bottom of the assistant).

View solution in original post

0 Karma

aoliner_splunk
Splunk Employee
Splunk Employee

This questions is impossible to answer well without knowing more about the data, but here are a few suggestions based on what you've provided:

  1. Predict delay (the difference between scheduled and start time) rather than start time.
  2. Use derived features of the scheduled time (like hour or day of the week) in addition to the epoch time.
  3. Try different algorithms, including random forest, and see which works best. If you stick with the defaults in the Toolkit, you only need to run the assistant a handful of times.
  4. Think about what you want from this model. If minimal RMSE is your goal, #3 is sufficient. If you want an interpretable model that tells you what features are important, for example, some models are better choices than others (models that support the summary command will be automatically summarized at the bottom of the assistant).
0 Karma

nasrinmulani
New Member

Thanks Aoliner,

I have worked on both approach , but i got good results with the approach 1 of calculating the start time into seconds.
Random forest is working fine for me, but i have some outliers because of that my result is having more RMSE value and R square value is coming 0.99.

I have one question that we should remove the outliers (deviation in data) or it should be there?

0 Karma

aoliner_splunk
Splunk Employee
Splunk Employee

Do you consider the outliers to be noise (e.g., measurement error, external interference, etc.) or a phenomenon you want to model?

Also, perfect prediction isn't always possible, especially in the presence of random noise or factors missing from your dataset. You may find it difficult to do better than R^2=0.99.

0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...