Splunk Search

How can I plot bounce rate over time?

lukeh
Contributor

Hi 🙂

I have a search that calculates the Bounce Rate for a web site:

source="web" configuration.client.company=foo event.type=page_view 
| stats dc(event.id) as eid_dc, c(event.id) as eid_c by session.id
| addcoltotals 
| eval BounceRate=eid_dc/eid_c
| stats last(BounceRate) as BounceRate

At a high level, it calculates the BounceRate by dividing the total number of distinct urls (ie. eid_dc) by the total number of urls (ie. eid_c) by Visit (ie. session.id)

I would like the ability to plot the BounceRate over time, but I can't get timechart to do the needful... any help will be greatly appreciated 🙂

Luke.

Tags (2)
1 Solution

sideview
SplunkTrust
SplunkTrust

I think you need the stats command to calculate the two numbers for each distinct session.id and _time. Then the timechart command will effectively treat each session as equally important and it'll calculate various overall statistics about that bounce rate and how it changes over time.

source="web" configuration.client.company=foo event.type=page_view 
| bin _time span=1h
| stats dc(event.id) as eid_dc, c(event.id) as eid_c by session.id _time
| eval BounceRate=eid_dc/eid_c
| timechart span=1h max(BounceRate) perc95(BounceRate) avg(BounceRate) min(BounceRate) 

NOTES:

the bin command rounds everything down to nearest hour, and matches the span I'm using in the timechart. Keep these synced or it gets weird.

the stats command will give back rows that are unique combinations of session.id and _time. Because of the bin command, _time is really "hour". And it'll compute your two number for each row appropriately.

then the timechart at the end doesn't care or know about sessions. it just treats all rows as equal and calculates things about global Bounce rate over time.

UPDATE:

I think I see (at least after you and I emailing a bit behind the scenes ;). You were using addcoltotals to sort of add things up, but what you really need here is stats itself to do that. Try this:

source="web" configuration.client.company=foo event.type=page_view
| bin _time span=1d
| stats dc(event.id) as eid_dc, c(event.id) as eid_c by session.id _time
| stats sum(eid_dc) as distinct_urls, sum(eid_c) as total_urls by _time
| eval BounceRate=distinct_urls/total_urls
| timechart span=1d max(BounceRate) as BounceRate

View solution in original post

sideview
SplunkTrust
SplunkTrust

I think you need the stats command to calculate the two numbers for each distinct session.id and _time. Then the timechart command will effectively treat each session as equally important and it'll calculate various overall statistics about that bounce rate and how it changes over time.

source="web" configuration.client.company=foo event.type=page_view 
| bin _time span=1h
| stats dc(event.id) as eid_dc, c(event.id) as eid_c by session.id _time
| eval BounceRate=eid_dc/eid_c
| timechart span=1h max(BounceRate) perc95(BounceRate) avg(BounceRate) min(BounceRate) 

NOTES:

the bin command rounds everything down to nearest hour, and matches the span I'm using in the timechart. Keep these synced or it gets weird.

the stats command will give back rows that are unique combinations of session.id and _time. Because of the bin command, _time is really "hour". And it'll compute your two number for each row appropriately.

then the timechart at the end doesn't care or know about sessions. it just treats all rows as equal and calculates things about global Bounce rate over time.

UPDATE:

I think I see (at least after you and I emailing a bit behind the scenes ;). You were using addcoltotals to sort of add things up, but what you really need here is stats itself to do that. Try this:

source="web" configuration.client.company=foo event.type=page_view
| bin _time span=1d
| stats dc(event.id) as eid_dc, c(event.id) as eid_c by session.id _time
| stats sum(eid_dc) as distinct_urls, sum(eid_c) as total_urls by _time
| eval BounceRate=distinct_urls/total_urls
| timechart span=1d max(BounceRate) as BounceRate

lukeh
Contributor

Thank you again Nick!

0 Karma

lukeh
Contributor

Thx Nick, I changed span=1h to span=1d to fit our requirements but we still need addcoltotals or similar to add up each distinct url and the total number of urls for each day, then calculate the Bounce Rate by dividing those two values, then plot that value for each data point.

0 Karma

MuS
Legend

Hi Luke Harris,

you're 'loosing' the _time with your first stats, therefore you're not able to timechart your events afterwards. Either try to add any time information to your stats or use chart ... over _time by session.id to replace the first stats.

This is un-tested because I'm missing your events 😉 but I hope this helps to get you started ...

cheers, MuS

0 Karma

lukeh
Contributor

Thx for the suggestion MuS, however the problem is that once we:
chart ... over _time
-or-
stats ... by session.id, _time
-or-
timechart ... by session.id

we lose the ability to get a total Bounce Rate for a given span period.

I have tried a few different combinations using bucket and stats with addtotals or addcoltotals, but I just can't get it to add up the distinct urls (ie. eid_dc) or the total number of urls (ie. eid_c) by Visit (ie. session.id) when split by _time - as they are now split by "eid_dc:session.id" and "eid_c:session.id" by _time

0 Karma
Get Updates on the Splunk Community!

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...

Introducing the 2024 Splunk MVPs!

We are excited to announce the 2024 cohort of the Splunk MVP program. Splunk MVPs are passionate members of ...

Splunk Custom Visualizations App End of Life

The Splunk Custom Visualizations apps End of Life for SimpleXML will reach end of support on Dec 21, 2024, ...