Splunk Search

How to group the daily active users by their activity during the last 2 weeks?

Motivator

Hi,

I would like to group the daily users by their number of active days during the last 2 weeks. My current search looks like this:

...
| bucket span=1d _time
| stats dc(user_id) AS count BY user_id, _time
| sort 0 + _time
| streamstats window=14 dc(_time) AS active_days BY user_id
| timechart span=1d dc(user_id) BY active_days

But this calculation does not include days of inactivity (because of the stats command) and ends up incorrect results.

How can I achieve my goal?

BR
Heinz

0 Karma
1 Solution

Esteemed Legend

Try this:

...
| timechart span=1d count BY user_id
| streamstats window=14 count(eval(count>0)) AS active_days BY user_id
| timechart span=1d dc(user_id) BY active_days

View solution in original post

Motivator

I think I've found a workaround. But it's veeeery expensive so I'm not really happy with it.

| bucket span=1d _time
| stats dc(account_id) AS dc by user_id_1, _time
| eventstats values(account_id) AS user_id_2
| mvexpand user_id_2
| stats count(eval(if(user_id_1=user_id_2, user_id_1, null()))) AS active BY user_id_2, _time
| sort 0 +  user_id_2, _time
| streamstats window=14 global=f dc(eval(if(active=1, _time, null()))) AS active_days BY user_id_2
| search active=1
| timechart span=1d dc(user_id_2) BY active_days limit=14
0 Karma

Esteemed Legend

See my update (new comment at the bottom) that fixes the problem with the solution as originally presented.

0 Karma

Esteemed Legend

Try this:

...
| timechart span=1d count BY user_id
| streamstats window=14 count(eval(count>0)) AS active_days BY user_id
| timechart span=1d dc(user_id) BY active_days

View solution in original post

Motivator

hi,

thanks for your answer. But I think the streamstats doesn't like the "timechart BY" before. It is not adding the active_days field

0 Karma

Esteemed Legend

You are correct; I am working on a correction.

0 Karma

Esteemed Legend

Try this:

...
| timechart span=1d count BY user_id
| untable _time user_id count
| streamstats window=14 count(eval(count>0)) AS active_days BY user_id
| timechart span=1d dc(user_id) BY active_days

Motivator

Hi woodcock,

thanks a lot for coming back to my problem.
I think we have to do some small adjustments, then this approach shows complete & correct results:

...
| timechart span=1d count BY userid limit=10000000
| untable _time user
id count
| sort 0 + account_id, _time
| streamstats window=14 global=false count(eval(count>0)) AS activedays BY userid
| timechart span=1d dc(eval(if(count>0, accountid, null()))) BY activedays

The sort command should have the same effect on the results like global=false, but I hope it helps to improve the peformance of streamstats.

Is there an option that tells a timechart to set the limit=infinite ?

0 Karma

Esteemed Legend

The sort portion is destroying the function of the search. The results must remain sorted primarily by _time or the streamstats part will not work correctly so definitely remove the sort line. That is a very important catch on the global=false part though!!! Usually limit=0 disables the limit entirely but I have not tried it on timechart.

0 Karma

Motivator

There is no difference, when I remove the sort line. And I don't get how this could destroy the function.
In the end the streamstats command has to analyse a window of 14 days per user_id. After the sort line, the table is so well prepared, that I could even delete the global=false command.

0 Karma

Motivator

PS: limit=0 is fine in the timechart

0 Karma