We are connected via DBConnect to a DB which contains a type of ticket data. The database creates new rows when the status of the ticket is updated. The previous ticket status remains in the database table in older rows and therefore our index. Tens of thousands of tickets are lumped into the same database table. When querying the data flat out without dedup, there are 150k records for the last 24h for example. Once deduped on the ticket number I get down to 1,500 in the last 24h.
I am looking for suggestions on if there are any alternatives to running " | dedup ticket_num" to surface up the latest ticket status as it changes over time. The search to dedup just today's data took 94 seconds. If I wanted to run analysis on the last months data I am afraid this will become too painful to be used.
Example logs: (Looking to return the last row for example and ignore the others in the results)
1.1.2020:08:00:00, ticket_status=new, subject="something is busted", assigned_to=, ticket_num=1234
1.1.2020:09:00:00, ticket_status=assigned, subject="something is busted", assigned_to="helpdesk", ticket_num=1234
1.1.2020:10:00:00, ticket_status=assigned, subject="something is busted", assigned_to="operations, ticket_num=1234
1.1.2020:11:00:00, ticket_status=closed, subject="something is busted", assigned_to="operations, ticket_num=1234
Any recommendations are highly appreciated!
Similar to @richgalloway 's answer
However, if you need only ticket status, I would just get the latest value for that column
stats latest(ticket_status) as ticket_status by ticket_num
You may find stats latest(*) as * by ticket_num
faster than dedup
.