Splunk Search

Search for user day 1 retention

arnol229
Explorer

I'm taking a shot at providing metrics on day 1 retention numbers of users in our system (Create a profile and the next day use server again).

I can grab a list of users and the _time they created their profile easily:

sourcetype=Profile event=profile__create | table user _time

I have a general server_log response event that will capture user activity:

sourcetype=server_log event=RESPONSE | bucket _time span=1d | stats values(user) by _time

So, how can I take the profile creation events, look forward a day of the event and see if there was activity by that user?
I've been working in this direction with a subsearch, but have unable to complete the query...

sourcetype=server_log event=RESPONSE [search sourcetype=Profile event=profile__create | table user _time] | bucket _time span=1d | stats values(user) by _time

Any advice/direction is greatly appreciated.


EDIT

Sorry, I realize this was poorly worded.

My expected result is to get the % of users who come back to our site the next day after creating a profile. Daily/Weekly/Monthly granularity.
EX:
User comes to the site on Monday, Jan. 1st and creates a profile.
This fires a profile_create event with the user's id in the field __user_.
The next day, we see a response event in the server_logs, so this user was retained on day 1.
If one hundred people create profiles in a day/week/month, what is the % of users that come back on day 1? (timechart or stats)

Thanks again for your input!

0 Karma

arnol229
Explorer

Worked out a solution, with big thanks to somesoni2 and sundareshr for their help.

sourcetype=Profile event=profile__create | eval mytime=strftime(_time, "%Y%m%d") | eval _time= strptime(mytime, "%Y%m%d") | stats count by _time | rename count as total_count 
| join type=left _time [search sourcetype=Profile event=profile__create | eval mytime=strftime(_time, "%Y%m%d") | eval _time= strptime(mytime, "%Y%m%d") | fields _time user 

| join type=inner _time user [search sourcetype=server_log event=RESPONSE | eval mytime=strftime(_time, "%Y%m%d") | eval _time= strptime(mytime, "%Y%m%d")-86400 | dedup _time user | fields _time user] | stats count by _time | rename count AS "day1"] 

| eval perc=day1/total_count | bucket _time span=$granularity$ | fillnull day1 perc value=0 | stats sum(total_count) AS "Total Profiles Created" sum(day1) AS "User's that returned on day 1" avg(perc) AS "% of Users that returned on day 1" by _time

Couldn't find the best way to format it for readability, so I apologize for that mess. Basically,
1. I search and get all profile create events and reconfigure the _time attribute to snap to the basic date
2. I left join on _time with a big subsearch that will return only events that match (inner join) the user and _time of a profile creation AND a server response that gets booted back to the previous day (the day the profile was created).
3. Now i have a table of _time by dates, total_count of created profiles, and count of events that match activity the day after the user created a profile
4. I bucket this time on a span that can be configured (1d,7d,1mon) in a dashboard, fill null counts with 0, and table things out in stats

Probably unconventional and unoptimized. for a last 30 day search, it takes about 10 - 15 seconds for the query to crunch. I'm sure i'll revisit this at a later date to refactor, but right now it fits my needs.

0 Karma

somesoni2
Revered Legend

Give this a try as well (may perform better, but check if the results are same as your final query)

sourcetype=Profile event=profile__create | bucket span=1d _time | eventstats count as total_count by _time | fields _time user total_count 
| join type=inner _time user [search sourcetype=server_log event=RESPONSE | bucket span=1d _time | dedup _time user | fields _time user event | eval _time=relative_time(_time,"-1d") ] 
| eventstats count(eval(event="RESPONSE")) a day1 by _time | table _time user total_count day1 
| eval perc=day1/total_count | bucket _time span=$granularity$ | fillnull day1 perc value=0 | stats sum(total_count) AS "Total Profiles Created" sum(day1) AS "User's that returned on day 1" avg(perc) AS "% of Users that returned on day 1" by _time
0 Karma

somesoni2
Revered Legend

Not sure If I understood the requirement completely. Are you looking for

1) find users created in Day1
2) Go to Day2 and find if those users identified in step 1 have any activity.

If that is correct, try this,

(sourcetype=server_log event=RESPONSE) OR (sourcetype=Profile event=profile__create) |bucket _time span=1d  | chart count(eval(sourcetype="server_log")) as Activity count(eval(sourcetype="Profile")  as Created by user _time | streamstats current=f window=1 values(Created) as prev by user | eval Result=if(prev>0 AND Activity>0,"Created and activity next day","")

sundareshr
Legend

Try something like this

sourcetype=server_log event=RESPONSE [search sourcetype=Profile event=profile__create | head 1 |  eval earliest=_time+3600 | latest=now() | table earliest latest]
Get Updates on the Splunk Community!

[Puzzles] Solve, Learn, Repeat: Dereferencing XML to Fixed-length events

This challenge was first posted on Slack #puzzles channelFor a previous puzzle, I needed a set of fixed-length ...

Stay Connected: Your Guide to December Tech Talks, Office Hours, and Webinars!

What are Community Office Hours? Community Office Hours is an interactive 60-minute Zoom series where ...

Data Management Digest – November 2025

  Welcome to the inaugural edition of Data Management Digest! As your trusted partner in data innovation, the ...