When I write searches in Splunk 90% of them is based on data this is only available in the _raw field not one of the indexed fields like host or sourcetype. My goal would be to run a query that would dedup on this portion, 34e6a6-6d0-4626-a319ce-24e6a63, of the _raw field.
May 16 16:34:09 server1 16:34:09,376 WARN Servlet - TIME 34e6a6-6d0-4626-a319ce-24e6a63 order=[abcdefg]
I can write regex for that value but when run the query below i still get duplicate values
index=application sourcetype=web | regex _raw = "\w\w\w\w\w\w-\w\w\w\w-\w\w\w\w\w\w-\w\w\w\w\w\w\w\" | dedup _raw
Can someone please let me know if what i am trying to do is possible and point me to the correct path?
Thanks in advance!!!
For the supplied log example, this would work :
Note : assumes that the format of the hex ID is consistent across different log events.
index=application sourcetype=web | rex field=_raw "TIME\s(?<hex_id>\w{6}-\w{3}-\w{4}-\w{6}-\w{7})" | dedup hex_id
Yes, the missing escape character must be due to a formatting issue in splunkbase because I am definitely using it. Also, the overall rex statement works perfectly fine as the format is consistent across this particular log event, its just when I add the "| dedup hex_id" to the query I get zero results. I went and removed the "TIME\s" from the query and everything worked correctly.
Thank you very much for your help.
Thank you for your response but unfortunately it didn't work. When I run the query
index=application sourcetype=web | rex field=_raw "TIME\s(?
When I try and run index=application sourcetype=web | rex field=_raw "TIME\s(?
Your copy/paste above doesn't match my post....the escape character before the "s" and after "TIME" is missing, maybe that's just a splunkbase formatting quirk.
Furthermore , refer to my original post , "Note : assumes that the format of the hex ID is consistent across different log events."...you only supplied 1 sample log event to work with, so if the pattern of the hex id is variable , then the regex pattern will need to be altered.
For the supplied log example, this would work :
Note : assumes that the format of the hex ID is consistent across different log events.
index=application sourcetype=web | rex field=_raw "TIME\s(?<hex_id>\w{6}-\w{3}-\w{4}-\w{6}-\w{7})" | dedup hex_id