Solved: Re: How to sort data in stats command output

Nraj87 · ‎01-06-2025

Dear All,

Kindly suggest , How to sort data in stats command output as per event time .

Example:
Requirement : VPN login details as per source user In last one hour.

SPL Query :
index="network" sourcetype=vpn
| eval "Login_Time" = strftime(_time, "%m/%d/%Y %I:%M:%S %p")
| stats values(SourceUser)as User values(Login_Time) as VPN_Login_Time count by _time host

Date	Host	User	VPN Login Time	Count
1/7/2025 0:00	10.10.8.45	Amar Rajesh Zainab	01/07/2025 06:01:25 AM 01/7/2025 06:30:21 AM 01/7/2025 06:50:49 AM	3

challenge in above example output, Amar was logged in at 01/7/2025 06:30:21AM and Zainab was logged in at 01/07/2025 06:01:25 AM but in output result user were sorted with alphabetical order and Login time were sorted in descending order. And User field can not be added as third field in by expression .

bowesmana · ‎01-07-2025

If you want to end up with several multivalue fields that are correlated with each other, you can't use stats values() as the output from a values() aggregation is always in sorted order.

There are a number of options

1. Use stats list() which will record the item for EVERY event but the order is preserved, but of course if you have duplicates for the same user on the same _time, you will have multiple entries. Note that list() has a maximum list length of 100 items

2. Make a combination field of the items you want to end up with and use stats values(new_field) and then split them out again, e.g. like this

...
| eval _tmp=SourceUser."###".Login_Time
| stats values(_tmp) as _tmp count by _time host
| rex field=_tmp max_match=0 "(?<User>.*)###(?<VPN_Login_Time>.*)"
| fields - _tmp

3. Do this to handle the potential duplicate logins on the same _time for the same user

...
| stats values(Login_Time) as VPN_Login_Time count by _time host SourceUser
| stats list(*) as * sum(count) as count by _time host

so include the SourceUser initially then use stats list finally

Hope this helps

View solution in original post

bowesmana · ‎01-07-2025

If you want to end up with several multivalue fields that are correlated with each other, you can't use stats values() as the output from a values() aggregation is always in sorted order.

There are a number of options

1. Use stats list() which will record the item for EVERY event but the order is preserved, but of course if you have duplicates for the same user on the same _time, you will have multiple entries. Note that list() has a maximum list length of 100 items

2. Make a combination field of the items you want to end up with and use stats values(new_field) and then split them out again, e.g. like this

...
| eval _tmp=SourceUser."###".Login_Time
| stats values(_tmp) as _tmp count by _time host
| rex field=_tmp max_match=0 "(?<User>.*)###(?<VPN_Login_Time>.*)"
| fields - _tmp

3. Do this to handle the potential duplicate logins on the same _time for the same user

...
| stats values(Login_Time) as VPN_Login_Time count by _time host SourceUser
| stats list(*) as * sum(count) as count by _time host

so include the SourceUser initially then use stats list finally

Hope this helps

How to sort data in stats command output

using Splunk Enterprise

Updated Data Type Articles, Anniversary Celebrations, and More on Splunk Lantern

A Prelude to .conf25: Your Guide to Splunk University

4 Ways the Splunk Community Helps You Prepare for .conf25

Are you a member of the Splunk Community?

How to sort data in stats command output

using Splunk Enterprise

Updated Data Type Articles, Anniversary Celebrations, and More on Splunk Lantern

A Prelude to .conf25: Your Guide to Splunk University

4 Ways the Splunk Community Helps You Prepare for .conf25