When I write searches in Splunk 90% of them is based on data this is only available in the _raw field not one of the indexed fields like host or sourcetype. My goal would be to run a query that would dedup on this portion, 34e6a6-6d0-4626-a319ce-24e6a63, of the _raw field.
May 16 16:34:09 server1 16:34:09,376 WARN Servlet - TIME 34e6a6-6d0-4626-a319ce-24e6a63 63.216.54.213:64524 order=[abcdefg]
I can write regex for that value but when run the query below i still get duplicate values
index=application sourcetype=web | regex _raw = "\w\w\w\w\w\w-\w\w\w\w-\w\w\w\w\w\w-\w\w\w\w\w\w\w\" | dedup _raw
Can someone please let me know if what i am trying to do is possible and point me to the correct path?
Thanks in advance!!!
For the supplied log example, this would work :
Note : assumes that the format of the hex ID is consistent across different log events.
index=application sourcetype=web | rex field=_raw "TIME\s(?<hex_id>\w{6}-\w{3}-\w{4}-\w{6}-\w{7})" | dedup hex_id
Damien,
Yes, the missing escape character must be due to a formatting issue in splunkbase because I am definitely using it. Also, the overall rex statement works perfectly fine as the format is consistent across this particular log event, its just when I add the "| dedup hex_id" to the query I get zero results. I went and removed the "TIME\s" from the query and everything worked correctly.
Thank you very much for your help.
Jen
Damien,
Thank you for your response but unfortunately it didn't work. When I run the query
index=application sourcetype=web | rex field=_raw "TIME\s(?
When I try and run index=application sourcetype=web | rex field=_raw "TIME\s(?
Thoughts?
Your copy/paste above doesn't match my post....the escape character before the "s" and after "TIME" is missing, maybe that's just a splunkbase formatting quirk.
Furthermore , refer to my original post , "Note : assumes that the format of the hex ID is consistent across different log events."...you only supplied 1 sample log event to work with, so if the pattern of the hex id is variable , then the regex pattern will need to be altered.
For the supplied log example, this would work :
Note : assumes that the format of the hex ID is consistent across different log events.
index=application sourcetype=web | rex field=_raw "TIME\s(?<hex_id>\w{6}-\w{3}-\w{4}-\w{6}-\w{7})" | dedup hex_id