Solved: Re: Why is time column different from date in tran...

RVDowning · ‎09-02-2015

If I run the following search for the previous month, the number of days that appears next to Sunday is 8? If I look at the data, I do only see 5 different dates in the events, yet 8 different dates appear in the Time column. For example, an event with the time "20150830 21:47:17" shows "8/31/15 12:00:00.000 AM" in the time column, whereas an event with the time "20150830 22:01:39" shows "8/30/15
12:00:00.000 AM" in the Time column.

source="c:\\logs\\aaaa" | transaction bbbb startswith=("CCCC STARTED") endswith=("CCCC ENDED") | 
bin span=1d _time | stats count dc(_time) as days by date_wday | eval average_count = count / days

lguinn2 · ‎09-02-2015

Remember that _time is normalized - it considers both the timezone of the host and the timezone that you have set in your user settings. date_wday is not normalized in any way - it is simply the day of the week that is associated with the raw datetime information in the event. This difference can certainly cause anomalies in your results.

The bin command sets the _time timestamp to the start of the day, so all _time will have a time "12:00:00.000 AM".

I think that a better way to do this would be:

source="c:\\logs\\aaaa" 
| transaction bbbb startswith=("CCCC STARTED") endswith=("CCCC ENDED") 
| eval day_of_week = strftime(_time,"%A")
| bin span=1d _time
| stats count dc(_time) as days by day_of_week
| eval average_count = count / days

Now all your calculations are based on the normalized timestamp.

View solution in original post

lguinn2 · ‎09-02-2015

Remember that _time is normalized - it considers both the timezone of the host and the timezone that you have set in your user settings. date_wday is not normalized in any way - it is simply the day of the week that is associated with the raw datetime information in the event. This difference can certainly cause anomalies in your results.

The bin command sets the _time timestamp to the start of the day, so all _time will have a time "12:00:00.000 AM".

I think that a better way to do this would be:

source="c:\\logs\\aaaa" 
| transaction bbbb startswith=("CCCC STARTED") endswith=("CCCC ENDED") 
| eval day_of_week = strftime(_time,"%A")
| bin span=1d _time
| stats count dc(_time) as days by day_of_week
| eval average_count = count / days

Now all your calculations are based on the normalized timestamp.

RVDowning · ‎09-02-2015

Ah, thanks much. That works fine. My remaining problem is that I added the following line

eventstats avg(average_count)

to get the average number of items per day, but the numbers are skewed because of the inclusion of Saturdays and Sundays which I have been asked to exclude. Guess I should post a separate question for this though.

lguinn2 · ‎09-03-2015

That's pretty easy, after the first eval, add | where day_of_week != "Saturday" AND day_of_week != "Sunday"
to eliminate those days.

Why is time column different from date in transaction?

Introducing Splunk Enterprise 9.2

Adoption of RUM and APM at Splunk

Routing logs with Splunk OTel Collector for Kubernetes