topic Re: Summary / accelerate query counting disjoint indexed events in Splunk Search

Summary / accelerate query counting disjoint indexed events

shawnce — Mon, 24 Mar 2014 18:38:55 GMT

I have a relatively large number of events being indexed and funneled into its own index based on source & source type (millions a week). This stream of events contains information about user activity in a product of ours and we desire to summarize user activity on a daily basis then build a dashboard that visualizes this summary information in various ways (often on longer timescales). We will likely utilize an accelerated search (prefer the simplicity) but may decide to use a summary search.

Note we are currently still using splunk 5.0.5.

The following is an example of a summary query that I am experimenting with and I am looking for any suggestions on how to improve it. It seems a little wrong to use if/match like I am.

index=myproduct build_type=prod (event_type="creating shape" OR event_type="Selecting tool" OR event_type="Undoing shape" OR event_type="Redoing shape") | eval DrawEvent=if(match(event_type,"creating shape"),"1","0") | eval ToolEvent=if(match(event_type,"Selecting tool"),"1","0") | eval UndoEvent=if(match(event_type,"Undoing shape"),"1","0") | eval RedoEvent=if(match(event_type,"Redoing shape"),"1","0") | bucket _time span=1day | stats sum(DrawEvent) AS UserDrawCount sum(ToolEvent) AS UserToolCount sum(UndoEvent) AS UserUndoCount sum(RedoEvent) AS UserRedoCount by _time,logged_user_id

...which produces a table like the following...

    _time   logged_user_id  UserDrawCount   UserToolCount   UserUndoCount   UserRedoCount
1   3/16/14 12:00:00.000 AM AAAAA   59  7   0   0
2   3/16/14 12:00:00.000 AM BBBBBB  135 35  42  2
3   3/16/14 12:00:00.000 AM CCCCC   139 3   0   0
4   3/16/14 12:00:00.000 AM DDDDD   895 65  54  1

Note in a future version of the product we are reworking the naming conventions used to allow for a wildcard to be used in the search (instead of such specific text) to narrow down the event stream to a family of user actions we wish to summarize in the same query.

Re: Summary / accelerate query counting disjoint indexed events

martin_mueller — Mon, 24 Mar 2014 21:40:19 GMT

Maybe it's just me, but what is your question?

Re: Summary / accelerate query counting disjoint indexed events

shawnce — Mon, 28 Sep 2020 16:13:14 GMT

I am basically looking to see if what I am doing about is reasonable or if a better way exists.

I have a stream of events like the following coming in from users using our app...

logged_user_id="AAAAA" event_type="creating shape" ...
logged_user_id="BBBBBB" event_type="Selecting tool" ...
logged_user_id="AAAAA" event_type="creating shape" ...
logged_user_id="CCCCC" event_type="Redoing shape" ...

I want to summarize this into a daily tally of each type of event by user, so turning multiple events into a single event for each user on each day. This will then be used to feed sub searches.

Re: Summary / accelerate query counting disjoint indexed events

martin_mueller — Mon, 24 Mar 2014 22:48:07 GMT

Yeah, feeding that into a summary index will give you great long-term statistics performance.

Re: Summary / accelerate query counting disjoint indexed events

shawnce — Mon, 28 Sep 2020 16:13:17 GMT

Basically is searching on event_type to narrow the number of events looked at followed by using eval=if(match(...) to tally each event_type matched, then bucketing by day, then summarizing using stats makes sense... or does a better way exist to do the daily summary not using the eval=if(match(..)) stuff but maybe features of stats more directly?

Again it needs to be grouped by day and logged in user.

Re: Summary / accelerate query counting disjoint indexed events

martin_mueller — Mon, 24 Mar 2014 22:50:20 GMT

You could merge the match into the stats like this:

... | stats count(eval(match(event_type, "creating shape"))) as UserDrawCount ...

But that's not necessarily better to read and maintain. From a performance point of view it's not going to matter much.

Re: Summary / accelerate query counting disjoint indexed events

martin_mueller — Mon, 24 Mar 2014 22:56:54 GMT

All in all - yeah, seems reasonable to me.

Consider moving the categorizing-eval-chain out into a macro for easy reuse and maintenance.