Splunk Search

Dedup in raw field

msmapper
Path Finder

When I write searches in Splunk 90% of them is based on data this is only available in the _raw field not one of the indexed fields like host or sourcetype. My goal would be to run a query that would dedup on this portion, 34e6a6-6d0-4626-a319ce-24e6a63, of the _raw field.

May 16 16:34:09 server1 16:34:09,376 WARN Servlet - TIME 34e6a6-6d0-4626-a319ce-24e6a63 63.216.54.213:64524 order=[abcdefg]

I can write regex for that value but when run the query below i still get duplicate values

index=application sourcetype=web | regex _raw = "\w\w\w\w\w\w-\w\w\w\w-\w\w\w\w\w\w-\w\w\w\w\w\w\w\" | dedup _raw

Can someone please let me know if what i am trying to do is possible and point me to the correct path?

Thanks in advance!!!

0 Karma
1 Solution

Damien_Dallimor
Ultra Champion

For the supplied log example, this would work :

Note : assumes that the format of the hex ID is consistent across different log events.

index=application sourcetype=web | rex field=_raw "TIME\s(?<hex_id>\w{6}-\w{3}-\w{4}-\w{6}-\w{7})" | dedup hex_id

View solution in original post

msmapper
Path Finder

Damien,

Yes, the missing escape character must be due to a formatting issue in splunkbase because I am definitely using it. Also, the overall rex statement works perfectly fine as the format is consistent across this particular log event, its just when I add the "| dedup hex_id" to the query I get zero results. I went and removed the "TIME\s" from the query and everything worked correctly.

Thank you very much for your help.

Jen

0 Karma

msmapper
Path Finder

Damien,

Thank you for your response but unfortunately it didn't work. When I run the query
index=application sourcetype=web | rex field=_raw "TIME\s(?\w{6}-\w{3}-\w{4}-\w{6}-\w{7})" ,I get about 200,000 results returned.

When I try and run index=application sourcetype=web | rex field=_raw "TIME\s(?\w{6}-\w{3}-\w{4}-\w{6}-\w{7})" | dedup hex_id, I get 0 results back.

Thoughts?

0 Karma

Damien_Dallimor
Ultra Champion

Your copy/paste above doesn't match my post....the escape character before the "s" and after "TIME" is missing, maybe that's just a splunkbase formatting quirk.

Furthermore , refer to my original post , "Note : assumes that the format of the hex ID is consistent across different log events."...you only supplied 1 sample log event to work with, so if the pattern of the hex id is variable , then the regex pattern will need to be altered.

0 Karma

Damien_Dallimor
Ultra Champion

For the supplied log example, this would work :

Note : assumes that the format of the hex ID is consistent across different log events.

index=application sourcetype=web | rex field=_raw "TIME\s(?<hex_id>\w{6}-\w{3}-\w{4}-\w{6}-\w{7})" | dedup hex_id
Get Updates on the Splunk Community!

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

Splunk is officially part of Cisco

Revolutionizing how our customers build resilience across their entire digital footprint.   Splunk ...

Splunk APM & RUM | Planned Maintenance March 26 - March 28, 2024

There will be planned maintenance for Splunk APM and RUM between March 26, 2024 and March 28, 2024 as ...