Solved: Timeseries data - average age of user

himynamesdave · ‎07-31-2017

Hi all -

I have a dataset that tracks server access. Every time a server makes a request an event is generated. A very simple example:

01/01/01 00:00:00 server_id=server_1
01/01/01 00:00:00 server_id=server_2
01/04/01 00:00:01 server_id=server_1
01/04/01 00:00:00 server_id=server_2

In this case server_id=1 AND server_id=2 first and last logons were exactly on 3 months apart, therefore the average age of server_id = 3 months. This is calculated by taking the first time an instance is seen and last time an instance is seen.

My problem is, some servers are seen daily, others days apart, etc. Essentially these servers represent customers and I want to track the average age of all servers over time (to measure wether customer (server) lifespan is increasing or decreasing).

I initially thought this would be simple using first and last times:

| eventstats last(_time) as first_time, first(_time) as last_time by server_id
| eval lifespan_days=(last_time-first_time)/86400
| timechart avg(lifespan_days)

Now I am doubting, given the results, the logic behind the query. My thinking is, when the search is set to "all time", the eventstats function will always look for the most recent "last" time of an event, versus the relative last time of the event (the day the event was generated).

Can anyone confirm this thinking or better still; offer any advice on crafting a query?

-dave

cmerriman · ‎07-31-2017

i would try something more like this:

 | eventstats earliest(_time) as first_time, latest(_time) as last_time by server_id
 | eval lifespan_days=(last_time-first_time)/86400
 | timechart avg(lifespan_days)

OR

|sort 0 server_id _time
|streamstats global=t current=t range(_time) as lifespan by server_id
|eval lifespan_days=lifespan/86400
| timechart avg(lifespan_days)

View solution in original post

niketn · ‎07-31-2017

I have created mock data based on your example. However, I have modified time a bit and added several daily start/end entries for server2 as per your scenario description. Ideally your logs should have had either an IDs indicating unique Start and End or else just strings Started and Ended to mark the events.

Currently, I see streamstats suitable for your scenario however, the same requires some assumptions. Streamstats should be applied on data sorted by time (i.e. either chronological or reverse). By default Splunk should have it sorted in reverse chronological order. This is for the use of streamstats.

I have made one more assumption that for each server first entry will be start and second will be stop. So I have grouped streamstats by server_id.

Following is run anywhere search where you can play around with data as per your use case. You will not need commands until rex, used to mock up data if you already have field extractions in place. Also _time related eval will not be required.

Ideally you should run only for one server at a time. Since there might be situations per server where there might be start but no stop event for specific server. In case you don't, the following case there is a map command to run the same streamstats command for each server. You will need to replace the first set commands prior to map with your own base query which returns server_id.

| makeresults 
| eval raw="time=\"01/01/2017 00:00:00\" server_id=server_1;
            time=\"01/01/2017 00:00:00\" server_id=server_2;
            time=\"01/02/2017 23:55:00\" server_id=server_2;
            time=\"01/02/2017 00:00:00\" server_id=server_2;
            time=\"01/03/2017 00:00:00\" server_id=server_2;
            time=\"01/03/2017 22:45:00\" server_id=server_2; 
            time=\"01/04/2017 00:00:01\" server_id=server_1;
            time=\"01/04/2017 00:00:00\" server_id=server_2;"
| mvcombine delim=";" raw
| eval raw=split(raw,";")
| mvexpand raw
| rex field=raw "time=\"(?<_time>[^\"]+)\"\s+server_id=(?<server_id>.*)"
| dedup server_id
| table server_id
| map search="| makeresults 
    | eval raw=\"time=\\\"01/01/2017 00:00:00\\\" server_id=server_1;
                time=\\\"01/01/2017 00:00:00\\\" server_id=server_2;
                time=\\\"01/02/2017 23:55:00\\\" server_id=server_2;
                time=\\\"01/02/2017 00:00:00\\\" server_id=server_2;
                time=\\\"01/03/2017 00:00:00\\\" server_id=server_2;
                time=\\\"01/04/2017 00:00:01\\\" server_id=server_1;
                time=\\\"01/03/2017 22:45:00\\\" server_id=server_2;\"
    | mvcombine delim=\";\" raw
    | eval raw=split(raw,\";\")
    | mvexpand raw
    | rex field=raw \"time=\\\"(?<_time>[^\\\"]+)\\\"\s+server_id=(?<server_id>.*)\"
    | search server_id=\"$server_id$\"
    | sort server_id
    | streamstats last(_time) as prevTime by server_id window=1 current=f
    | eval counter=1
    | accum counter
    | eval counter=counter%2
    | search counter=0
    | eval _time=strptime(_time,\"%m/%d/%Y %H:%M:%S\")
    | eval prevTime=strptime(prevTime,\"%m/%d/%Y %H:%M:%S\")
    | eval duration=round((_time-prevTime)/(24*3600),2)"
| fieldformat prevTime=strftime(prevTime,"%m/%d/%Y %H:%M:%S")

If you can run the query only for one server at a time. You will not need map to recursively perform streamstats:

| makeresults 
| eval raw="time=\"01/01/2017 00:00:00\" server_id=server_1;
 time=\"01/01/2017 00:00:00\" server_id=server_2;
 time=\"01/02/2017 23:55:00\" server_id=server_2;
 time=\"01/02/2017 00:00:00\" server_id=server_2;
 time=\"01/03/2017 00:00:00\" server_id=server_2;
 time=\"01/03/2017 22:45:00\" server_id=server_2; 
 time=\"01/04/2017 00:00:01\" server_id=server_1;
 time=\"01/04/2017 00:00:00\" server_id=server_2;"
| mvcombine delim=";" raw
| eval raw=split(raw,";")
| mvexpand raw
| rex field=raw "time=\"(?<_time>[^\"]+)\"\s+server_id=(?<server_id>.*)"
| sort server_id
| streamstats last(_time) as prevTime by server_id window=1 current=f
| eval counter=1
| accum counter
| eval counter=counter%2
| search counter=0
| eval _time=strptime(_time,"%m/%d/%Y %H:%M:%S")
| eval prevTime=strptime(prevTime,"%m/%d/%Y %H:%M:%S")
| eval duration=round((_time-prevTime)/(24*3600),2)
| fieldformat prevTime=strftime(prevTime,"%m/%d/%Y %H:%M:%S")

The streamstats takes the _time from an event and assigns it as previous value by using window=1 current=f parameters.
Modular division by 2 sets every alternate rows as 0 and 1. If there are odd number of rows i.e. the last result per server is not Stopped, this will allow us to ignore that row as well.

____________________________________________
| makeresults | eval message= "Happy Splunking!!!"

cmerriman · ‎07-31-2017

i would try something more like this:

 | eventstats earliest(_time) as first_time, latest(_time) as last_time by server_id
 | eval lifespan_days=(last_time-first_time)/86400
 | timechart avg(lifespan_days)

OR

|sort 0 server_id _time
|streamstats global=t current=t range(_time) as lifespan by server_id
|eval lifespan_days=lifespan/86400
| timechart avg(lifespan_days)

himynamesdave · ‎07-31-2017

The streamstats example worked perfectly!

The global=t current=t for streamstats worked a treat. I should really get more familiar with this function 🙂

Thank you!

cmerriman · ‎07-31-2017

http://docs.splunk.com/Documentation/SplunkCloud/6.6.0/SearchReference/Streamstats

there is some documentation on it. it is a great function for a running stats function for a window of events.

Timeseries data - average age of user

Announcing Scheduled Export GA for Dashboard Studio

Extending Observability Content to Splunk Cloud

More Control Over Your Monitoring Costs with Archived Metrics GA in US-AWS!