This is my code
index="google_apis" source="https://www.googleapis.com/youtube"
| timechart span=1h avg(subCount) by channel
the results looks something like this
Now, my problem is I can't seem to find a way on how to fill the null values with this formula:
"average of the field" + ("stdev of the field" * random(-3, 3))
My intention is to fill the null values with psuedo values that is 3 sigmas away (below or above) from the mean of the fields. I can get the avg and stdev by the eventstats
command. I can also get random number between -3 to 3. But I can't put them to the right place.
Thanks in advance
Below code is working and possibly the most efficient way of doing it. I'm accepting this as the correct answer but it was only because of @VatsalJagani 's answer that I was able to work it out.
index="google_apis" source="https://www.googleapis.com/youtube"
| timechart span=1h avg(subCount) by channel
| fields - NULL
| eventstats avg as a_*, stdev as s_*
| eval rand = tostring("0." . random())
| eval rand =
case(
(rand > 0.00 AND rand < 0.16), -3,
(rand > 0.16 AND rand < 0.32), -2,
(rand > 0.32 AND rand < 0.48), -1,
(rand > 0.48 AND rand < 0.64), 1,
(rand > 0.64 AND rand < 0.80), 2,
(rand > 0.80 AND rand < 1.00), 3,
true(), 1)
| foreach *
[ eval "<<FIELD>>" = floor(ifnull('<<FIELD>>', 'a_<<FIELD>>' + ('s_<<FIELD>>' * rand), '<<FIELD>>'))]
| fields - a_*, s_*, rand
Below code is working and possibly the most efficient way of doing it. I'm accepting this as the correct answer but it was only because of @VatsalJagani 's answer that I was able to work it out.
index="google_apis" source="https://www.googleapis.com/youtube"
| timechart span=1h avg(subCount) by channel
| fields - NULL
| eventstats avg as a_*, stdev as s_*
| eval rand = tostring("0." . random())
| eval rand =
case(
(rand > 0.00 AND rand < 0.16), -3,
(rand > 0.16 AND rand < 0.32), -2,
(rand > 0.32 AND rand < 0.48), -1,
(rand > 0.48 AND rand < 0.64), 1,
(rand > 0.64 AND rand < 0.80), 2,
(rand > 0.80 AND rand < 1.00), 3,
true(), 1)
| foreach *
[ eval "<<FIELD>>" = floor(ifnull('<<FIELD>>', 'a_<<FIELD>>' + ('s_<<FIELD>>' * rand), '<<FIELD>>'))]
| fields - a_*, s_*, rand
Hi @morethanyell,
Check below query, it is really easy to understand whats happening.
| makeresults | eval a="12", b="13" | append [| makeresults | eval a="13"] | append [| makeresults | eval b="3"] | append [| makeresults | eval a="2", b="1"]
| appendpipe [| stats stdev(*) as stdev_*, avg(*) as avg_*
| foreach avg_*
[ eval r_<<MATCHSTR>>=random() % 2
| eval r_<<MATCHSTR>>=if(r_<<MATCHSTR>>==0,1,-1)
| eval r_<<MATCHSTR>>=(random() % 4) * r_<<MATCHSTR>>
| eval replace_<<MATCHSTR>> = <<FIELD>> + (stdev_<<MATCHSTR>> * r_<<MATCHSTR>>)] | table replace_*]
| eventstats avg(replace_*) as replace_*
| foreach replace_* [eval <<MATCHSTR>>=coalesce(<<MATCHSTR>>, <<FIELD>>)]
| fields - replace_*
Above query will work, just replace first line with your query in question. Below is just understanding how the query is working, if you are interested.
| makeresults | eval a="12", b="13" | append [| makeresults | eval a="13"] | append [| makeresults | eval b="3"] | append [| makeresults | eval a="2", b="1"]
is just to make similar results that you are having.
| appendpipe [| stats stdev(*) as stdev_*, avg(*) as avg_*
Finds the stdev and avg.
| foreach avg_*
[ eval r_<<MATCHSTR>>=random() % 2
| eval r_<<MATCHSTR>>=if(r_<<MATCHSTR>>==0,1,-1)
| eval r_<<MATCHSTR>>=(random() % 4) * r_<<MATCHSTR>>
| eval replace_<<MATCHSTR>> = <<FIELD>> + (stdev_<<MATCHSTR>> * r_<<MATCHSTR>>)] | table replace_*]
Finds the random number between (-3,3) and find the value that can be replaced where NULL is present. (replace_* fields)
| eventstats avg(replace_*) as replace_*
| foreach replace_* [eval <<MATCHSTR>>=coalesce(<<MATCHSTR>>, <<FIELD>>)]
Replace NULL values with replace_* field values with coalesce command.
| fields - replace_*
Just to remove unnecessary field replace_*
Hope this helps!!! BTW @morethanyell this is really nice question.
Amazing work. But how can I capture the fields with spaces?
EDIT: i worked it out. thank you very much!
Brute force method I used:
index="google_apis" source="https://www.googleapis.com/youtube"
| timechart span=4h avg(subCount) by channel
| fields - NULL
| eval rand = tonumber("0." . random())
| eval rand =
case(
(rand > 0.00 AND rand < 0.16), -3,
(rand > 0.16 AND rand < 0.32), -2,
(rand > 0.32 AND rand < 0.48), -1,
(rand > 0.48 AND rand < 0.64), 1,
(rand > 0.64 AND rand < 0.80), 2,
(rand > 0.80 AND rand < 1.00), 3,
true(), 1)
| eventstats avg as a_*, stdev as s_*
| eval fnv = 'a_Making it happen Vlog' + (rand * 's_Making it happen Vlog')
| eval "Making it happen Vlog" = floor(coalesce('Making it happen Vlog', fnv))
| eval fnv = 'a_The Juicy Vlog' + (rand * 's_The Juicy Vlog')
| eval "The Juicy Vlog" = floor(coalesce('The Juicy Vlog', fnv))
| eval fnv = a_Morgz + (rand * s_Morgz)
| eval Morgz = floor(coalesce(Morgz, fnv))
| eval fnv = a_MrBeast + (rand * s_MrBeast)
| eval MrBeast = floor(coalesce(MrBeast, fnv))
| eval fnv = a_PewDiePie + (rand * s_PewDiePie)
| eval PewDiePie = floor(coalesce(PewDiePie, fnv))
| eval fnv = a_Music + (rand * s_Music)
| eval Music = floor(coalesce(Music, fnv))
| eval fnv = a_TSeries + (rand * s_TSeries)
| eval TSeries = floor(coalesce(TSeries, fnv))
| fields - a_*, s_*, rand, fnv
| fields _time, *