Looking for SPL that will give me the ID Cost by month, only grabbing the last event (_time) for that month. Sample data below. I have a system that updates cost daily for the same ID. Looking for guidence before I venture down a wrong path. Sample data below.
Thank you!
bill_date | ID | Cost | _time |
6/1/25 | 1 | 1.24 | 2025-06-16T12:42:41.282-04:00 |
6/1/25 | 1 | 1.4 | 2025-06-16T12:00:41.282-04:00 |
5/1/25 | 1 | 2.5 | 2025-06-15T12:42:41.282-04:00 |
5/1/25 | 1 | 2.2 | 2025-06-14T12:00:41.282-04:00 |
5/1/25 | 2 | 3.2 | 2025-06-14T12:42:41.282-04:00 |
5/1/25 | 2 | 3.3 | 2025-06-14T12:00:41.282-04:00 |
3/1/25 | 1 | 4.4 | 2025-06-13T12:42:41.282-04:00 |
3/1/25 | 1 | 5 | 2025-06-13T12:00:41.282-04:00 |
3/1/25 | 2 | 6 | 2025-06-13T12:42:41.282-04:00 |
3/1/25 | 2 | 6.3 | 2025-06-13T12:00:41.282-04:00 |
Hi @chrisboy68,
There are lots of options presented, but combining @yuanliu's response with a conversion from bill_date to year and month gives the output closest to "ID Cost by month":
| makeresults format=csv data="bill_date,ID,Cost,_time
6/1/25,1,1.24,2025-06-16T12:42:41.282-04:00
6/1/25,1,1.4,2025-06-16T12:00:41.282-04:00
5/1/25,1,2.5,2025-06-15T12:42:41.282-04:00
5/1/25,1,2.2,2025-06-14T12:00:41.282-04:00
5/1/25,2,3.2,2025-06-14T12:42:41.282-04:00
5/1/25,2,3.3,2025-06-14T12:00:41.282-04:00
3/1/25,1,4.4,2025-06-13T12:42:41.282-04:00
3/1/25,1,5,2025-06-13T12:00:41.282-04:00
3/1/25,2,6,2025-06-13T12:42:41.282-04:00
3/1/25,2,6.3,2025-06-13T12:00:41.282-04:00"
| eval _time=strptime(_time, "%FT%T.%N%z")
``` end test data ```
``` assuming month/day/year for bill_date ```
| eval Month=strftime(strptime(bill_date, "%m/%e/%y"), "%Y-%m")
| stats latest(Cost) as Cost by Month ID
Month ID Cost
----- -- ----
2025-03 1 4.4
2025-03 2 6
2025-05 1 2.5
2025-05 2 3.2
2025-06 1 1.24
You can alternatively use chart, xyseries, etc. to pivot the results:
| chart latest(Cost) over ID by Month
ID 2025-03 2025-05 2025-06
-- ------- ------- -------
1 4.4 2.5 1.24
2 6 3.2
Thanks. Believe I got it. What tripped me up, is I didn't realize latest could be used for non-time based fields.
Ok, let me try to get some better sample data. Believe I have it here. While this is only one ID, the data has multiple IDs, and its spans multiple months.
| makeresults count=1 | eval ID="10001", _time=strptime("2025-06-01 08:00:00", "%Y-%m-%d %H:%M:%S"), billing_date="2025-06-01", cost=100.50, code="product1" | append [| makeresults | eval ID="10001", _time=strptime("2025-06-01 10:15:00", "%Y-%m-%d %H:%M:%S"), billing_date="2025-06-01", cost=120.75, code="product2"] | append [| makeresults | eval ID="10001", _time=strptime("2025-06-01 13:30:00", "%Y-%m-%d %H:%M:%S"), billing_date="2025-06-01", cost=140.00, code="product3"] | append [| makeresults | eval ID="10001", _time=strptime("2025-06-02 10:15:00", "%Y-%m-%d %H:%M:%S"), billing_date="2025-06-01", cost=130.75, code="product2"] | append [| makeresults | eval ID="10001", _time=strptime("2025-06-02 13:30:00", "%Y-%m-%d %H:%M:%S"), billing_date="2025-06-01", cost=150.00, code="product3"] | append [| makeresults | eval ID="10001", _time=strptime("2025-06-01 08:10:00", "%Y-%m-%d %H:%M:%S"), billing_date="2025-06-01", cost=102.50, code="product1"] | append [| makeresults | eval ID="10001", _time=strptime("2025-06-01 10:15:00", "%Y-%m-%d %H:%M:%S"), billing_date="2025-06-01", cost=125.75, code="product2"] | append [| makeresults | eval ID="10001", _time=strptime("2025-06-01 13:30:00", "%Y-%m-%d %H:%M:%S"), billing_date="2025-06-01", cost=145.00, code="product3"] | append [| makeresults | eval ID="10001", _time=strptime("2025-06-02 10:15:00", "%Y-%m-%d %H:%M:%S"), billing_date="2025-06-01", cost=135.75, code="product2"] | append [| makeresults | eval ID="10001", _time=strptime("2025-06-02 13:30:00", "%Y-%m-%d %H:%M:%S"), billing_date="2025-06-01", cost=155.00, code="product3"] | append [| makeresults | eval ID="10001", _time=strptime("2025-05-01 10:15:00", "%Y-%m-%d %H:%M:%S"), billing_date="2025-05-01", cost=125.75, code="product2"] | append [| makeresults | eval ID="10001", _time=strptime("2025-05-01 13:30:00", "%Y-%m-%d %H:%M:%S"), billing_date="2025-05-01", cost=145.00, code="product3"] | append [| makeresults | eval ID="10001", _time=strptime("2025-05-02 10:15:00", "%Y-%m-%d %H:%M:%S"), billing_date="2025-05-01", cost=135.75, code="product2"] | append [| makeresults | eval ID="10001", _time=strptime("2025-05-02 13:30:00", "%Y-%m-%d %H:%M:%S"), billing_date="2025-05-01", cost=155.00, code="product3"] | append [| makeresults | eval ID="10001", _time=strptime("2025-05-01 10:15:00", "%Y-%m-%d %H:%M:%S"), billing_date="2025-05-01", cost=120.75, code="product2"] | append [| makeresults | eval ID="10001", _time=strptime("2025-05-02 13:30:00", "%Y-%m-%d %H:%M:%S"), billing_date="2025-05-01", cost=140.00, code="product3"] | append [| makeresults | eval ID="10001", _time=strptime("2025-05-02 10:15:00", "%Y-%m-%d %H:%M:%S"), billing_date="2025-05-01", cost=130.75, code="product2"] | append [| makeresults | eval ID="10001", _time=strptime("2025-05-02 13:30:00", "%Y-%m-%d %H:%M:%S"), billing_date="2025-05-01", cost=150.00, code="product3"]
Is this the result you are looking for?
ID | billing_date | code | latest(cost) | _time |
10001 | 2025-05-01 | product2 | 135.75 | 2025-05-02 10:15:00 |
10001 | 2025-05-01 | product3 | 155.00 | 2025-05-02 13:30:00 |
10001 | 2025-06-01 | product1 | 102.50 | 2025-06-01 08:10:00 |
10001 | 2025-06-01 | product2 | 130.75 | 2025-06-02 10:15:00 |
10001 | 2025-06-01 | product3 | 150.00 | 2025-06-02 13:30:00 |
dedup with perfect sort as @PickleRick suggests should work. Another way is to simply use stats as I originally suggested:
| stats latest(cost) max(_time) as _time by ID billing_date code
Stats should be way faster and efficient but it won't give you other fields. So whether it's stats or dedup depends on the desired results.
Oooompf. That's a bit ineffective way of creating mock data. 😉
I'd go with makeresults format=csv data=...
But to the point.
Assuming you want the first (or last - it's just a matter of proper sorting) cost value for each ID daily
| sort - _time
``` this way you'll get the latest value for each day because it will be the first one```
| bin _time span=1d
``` this will "group" your data by day ```
| dedup _time ID
``` and this will only leave first event for each combination of _time and ID```
You can of course sort the other way (actually the reverse chronological order is the default one; it's just included here for the solution to be as explicitly stated as possible) if you want first values daily, not last ones. And can do dedup over more fields (to get the values by code as well as date and ID, for example).
Yup. You're right. I keep forgetting about that option. For me it's clearer to do those two operations separately. I wonder though whether there is a performance difference.
You need to be a bit more precise about the requirements but generally it indeed looks like a case for proper sorting data and using dedup so that it only "catches" the first result for any given combination of fields.
First, the mock data doesn't seem to agree with "update cost daily". Wouldn't the following make more sense?
bill_date | ID | Cost | _time |
6/1/25 | 1 | 1.24 | 2025-06-16 09:42:41.282 |
6/1/25 | 1 | 1.4 | 2025-06-06 09:00:41.282 |
5/1/25 | 1 | 2.5 | 2025-05-25 09:42:41.282 |
5/1/25 | 1 | 2.2 | 2025-05-15 09:00:41.282 |
5/1/25 | 2 | 3.2 | 2025-05-14 09:42:41.282 |
5/1/25 | 2 | 3.3 | 2025-05-04 09:00:41.282 |
3/1/25 | 1 | 4.4 | 2025-03-23 09:42:41.282 |
3/1/25 | 1 | 5 | 2025-03-18 09:00:41.282 |
3/1/25 | 2 | 6 | 2025-03-13 09:42:41.282 |
3/1/25 | 2 | 6.3 | 2025-03-03 08:00:41.282 |
Secondly, when you say latest event "of the month", I assume "month" can be represented by bill_date. Is this correct?
This is the search you need:
| stats latest(Cost) as Cost by bill_date ID
If you want the latest cost for each ID per month,
try this,
index=main
| bin _time span=1mon
| stats latest(Cost) as Cost latest(bill_date) as bill_date latest(_time) as _time by ID _time
| table bill_date ID Cost _time
Regards,
Prewin
Splunk Enthusiast | Always happy to help! If this answer helped you, please consider marking it as the solution or giving a Karma. Thanks!
Hi @chrisboy68
How about using the bin command to bucket into 1 month blocks, then dedup on _time, or take first(fields) such as:
index=main
| bin _time span=1month
| dedup _time
| table bill_date ID Cost _time
or
index=main
| bin _time span=1month
| stats first(bill_date) as bill_date, first(ID) as ID, first(Cost) as Cost by _time
Or you could even look at timechart if useful.
🌟 Did this answer help you? If so, please consider:
Your feedback encourages the volunteers in this community to continue contributing
Thanks, but that is still not working. Its only grabbing the very first ID. The data will have many IDs to one bill_date to multiple event times/_time.
Have you try dedup with sortby? And of course you should use bin with a new column like
index=main
| bin _time as time span=1month
| dedup time sortby _time
| table bill_date ID Cost _time
In that way it should take only one event per month. Modifying sort order it will be first or last event in month.