We have a csv input which is a daily dump from a ticket DB for the current month.
The DB output looks something like this:
TICKET, CREATE_TIME, MODIFIED_TIME, A, B, C, D, E
The daily input will continually re-introduce events that match existing TICKETs to Splunk every time it runs; however, other fields will be updated and filled in over time.
We need to only report on the most recent version of a given ticket event in Splunk.
What I have working now is this:
index="ticket" | top limit=1 MODIFIED_TIME by TICKET | stats list by TICKET
It is reasonably fast over small data groups, but I feel like there must be a better way to do this. Something like this would be better " | stats list by Case_ID max(Modified_Time) " if it were possible.
snip Log Sample
"Case ID","Create Date","Modified Time","Detection Type","Time of Detection","Severity","External Media","Policy Violation","Offline Scan","Wiped","chk_UserRunsWithAdminPriv","chk_SystemIsManagedDesktop","chk_Escalated","IP Address","Hostname","Faceplate Jack","Property Number","Local ID","Group"
"HD194292","7/26/2010 4:40:01 PM","7/27/2010 7:44:35 AM","IPS","","","None","None","None","None","","","","192.168.202.194","HOSTA","","","BOB6243","09-92"
"HD194292","7/26/2010 4:40:01 PM","7/27/2010 7:44:35 AM","IPS","","","None","None","None","None","","","","192.168.202.194","HOSTA","","","BOB6243","09-92"
"HD194292","7/26/2010 4:40:01 PM","7/28/2010 9:30:44 AM","IPS","","IN PROCESS","None","None","None","None","","","","192.168.202.194","HOSTA","","","BOB6243","09-92"
"HD194016","7/23/2010 12:01:56 AM","7/23/2010 1:46:12 PM","Anti-Virus","7/23/2010 1:19:30 AM","Brown","None","None","None","None","","","","192.168.162.209","HOSTB","","000771220","BOB4512","10-06"
"HD194016","7/23/2010 12:01:56 AM","7/23/2010 1:46:12 PM","Anti-Virus","7/27/2010 5:30:30 PM","Blue","None","None","None","None","","","","192.168.162.209","HOSTB","","000771220","BOB4512","10-06"
/snip
Sean
... View more