Hello,
I have this search query:
sourcetype="device"
| bucket span=1d _time | makecontinuous _time
| stats count by _time, user | fillnull count
I was expecting that by using makecontinuous the days when the count was 0 will also be added to the results. With this query I get this result.
_time user count
2017-08-18 user2 5
2017-08-21 user2 1
2017-08-25 user2 4
2017-08-27 user2 1
2017-08-30 user2 6
I was expecting this result:
_time user count
2017-08-18 user2 5
2017-08-19 user2 0
2017-08-20 user2 0
2017-08-21 user2 1
2017-08-22 user2 1
.....and so on
2017-08-25 user2 4
2017-08-26 user2 0
2017-08-27 user2 1
2017-08-30 user2 6
I know that this would work well with timechart but I really need to use stats, so that I can then use the results in Machine Learning Toolkit, and timechart would not work there.
Perhaps this could help if you wanted it in another format?
| timechart limit=0 span=5m count by user
| fillnull
| untable _time, user, count
...
I've used that trick to fill in the missing time points before...let me know if that helps!
Perhaps this could help if you wanted it in another format?
| timechart limit=0 span=5m count by user
| fillnull
| untable _time, user, count
...
I've used that trick to fill in the missing time points before...let me know if that helps!
this works, I would mark this as answer, but it is a reply, so I cannot mark it.
Moved to answer!
You have some options here. Since the MLTK is appending stats on there, any command such as fillnull
or makecontineous
will not solve your issue since it needs to be passed after timechart/stats.
You need to mock up some dummy data and set its values to zero then allow stats to fill in any non-null values.
An example would look like this
| makeresults | eval field1="" | eval field2=""
| append [| search index=... sourcetype=... | bin _time span=10m | stats count by _time | fillnull value=0]
So if your time range a 60 min span. The makeresults
command will create 6 bins with 10 minute time spans and will fill any empty bin with a zero. You could also take the approach of using a lookup table to populate your null values or you could use the internal index to populate placeholders to prevent null values.
Could you please elaborate your solution a bit ? I am faced with a similar issue where _time is discontinuous and MLTK throws error as I try to fit or apply model. TIA. FYI , I am quite new to Splunk but learning things fast.
I've come up with a much better solution since posting this reply. Ask a new question and I will give you the code
Thanks . I used time chart to fix my issue currently . please let me know if your soln is different , I will start off a new thread .
Any update on if this helped?
Your answer helped to understand why it does not work.
Someone else suggested the solution in one of the replies:
| timechart limit=0 span=5m count by user
| fillnull
| untable _time, user, count
@jorjiana88, try the timechart command.
sourcetype="device" user="*"
| timechart count by user useother=f limit=0
It is not an option to use timechart because it changes how the result is displayed and I cannot later apply some machine learning algorithm after timechart. I really need to use stats.
@jorjiana88, is it one of built in Machine Learning Toolkit Algorithm, or you are trying to create your own?
Can you please give the Algorithm you are trying to use? Outputwise, timechart command above generates same fields as stats command in your query, so I don't see how the two would be picked up differently by the algorithm.