Splunk Search

Why is my field extraction not consistent across all events?

Path Finder

I want to extract a field which is uuid format and name it instanceid.

props.conf settings

EXTRACT-fields_5 = \[[i]nstance:\s+(?P<instanceid>[0-9a-f]{8}\-[0-9a-f]{4}\-[0-9a-f]{4}\-[0-9a-f]{4}\-[0-9a-f]{12})

For logs like ...

2017-01-01 00:00:00.000 99999 INFO xxxxxxxxxxxx [-] [instance: 01234567-89ab-cdef-0123-456789abcdef] Instance destroyed successfully.

However, it works for some events but it doesn't for some other events.
When I changed the field name to nstanceid or istanceid in regex, it works for all events. I don't know what's wrong with the field name instanceid.
OTOH, rex command with above regex (field name is instanceid) works well.

Would somebody give me the reason why??

0 Karma

Esteemed Legend

The problem is two-fold: either the event does not have what you think all of them does (non-conforming event data) OR your RegEx is slightly off and does not fully accommodate all variations of the events (insufficient RegEx). In either case, here is what you need to do to figure it out. Deploy the version that works best, let's say that you are using a field name of instance_id. Then run a search like this:

... NOT instance_id="*"

This will show you all events that do not have a field called instance_id. You adjust your RegEx or ignore that type of event (by putting an exclusion for it in your base search) and keep repeating this cycle until you have no events returned from that search.

0 Karma

Path Finder

Mmm... After I changed the extracted field name in regex from instanceid to instance_id for workaround, it doesn't work for some events. It worked fine soon after I did change, but 1 hour later, it doesn't.

0 Karma

SplunkTrust
SplunkTrust

Could you provide us with the exakt event _raw payload that doesn't match this regex?

0 Karma

SplunkTrust
SplunkTrust

Hi diavolo,

my guess would be that in some events there is actually a field called instanceid.
Try to use a completely new/different field name to test your field extraction, something like this should work for you:

 \[instance:\s+(?<ThisIsMyTestFieldName>[^\]]+)

cheers, MuS

0 Karma

Path Finder

Thanks MuS,
instanceid is not used anywhere. Changing field name like instance_id works fine. But I was wondering why...

0 Karma

SplunkTrust
SplunkTrust

The problem may be the (?P at the beginning of the regex.

Also, I believe you can shorthand hex digits as \h, so your regex can look a bit cleaner if you try this -

 EXTRACT-fields_5 = \[instance:\s+(?<instanceid>\h{8}\-\h{4}\-\h{4}\-\h{4}\-\h{12})

see this page for more details - http://www.regular-expressions.info/refext.html

0 Karma

Path Finder

? didn't fix the problem... Also, \h for hex didn't work.

0 Karma

SplunkTrust
SplunkTrust

1) when you say "change the field name" are you talking about the underlying data, or the field name being extracted by the regex?
2) can you post an example of an event that the extract did NOT work for?

0 Karma

Path Finder

1) The latter one. I changed regex from (?P<instanceid>...) to (?P<nstanceid>...). It worked.
2)
- Worked:
2017-01-06 03:08:35.416 21995 INFO nova.virt.libvirt.driver [-] [instance: 40624b9c-8179-4cb0-82ec-924ee5362cc0] Instance destroyed successfully.
- Not Worked:
2017-01-06 03:07:25.932 21995 DEBUG nova.network.neutronv2.api [-] [instance: 6708c71b-0f49-4b0b-8040-fec13e3e2a4b] get_instance_nw_info() _get_instance_nw_info /usr/lib/python2.7/site-packages/nova/network/neutronv2/api.py:602

0 Karma

SplunkTrust
SplunkTrust

Hi diavolo,

try the following.

(?:\[instance:\s+)(?P<instanceid>[0-9a-f]{8}\-[0-9a-f]{4}\-[0-9a-f]{4}\-[0-9a-f]{4}\-[0-9a-f]{12})(?:\])

Should work fine now.

0 Karma

Path Finder

Unfortunately, it doesn't work. The field can't be extracted in some events.

0 Karma