Splunk Search

Why is my field extraction not consistent across all events?

diavolo
Path Finder

I want to extract a field which is uuid format and name it instanceid.

props.conf settings

EXTRACT-fields_5 = \[[i]nstance:\s+(?P<instanceid>[0-9a-f]{8}\-[0-9a-f]{4}\-[0-9a-f]{4}\-[0-9a-f]{4}\-[0-9a-f]{12})

For logs like ...

2017-01-01 00:00:00.000 99999 INFO xxxxxxxxxxxx [-] [instance: 01234567-89ab-cdef-0123-456789abcdef] Instance destroyed successfully.

However, it works for some events but it doesn't for some other events.
When I changed the field name to nstanceid or istanceid in regex, it works for all events. I don't know what's wrong with the field name instanceid.
OTOH, rex command with above regex (field name is instanceid) works well.

Would somebody give me the reason why??

0 Karma

woodcock
Esteemed Legend

The problem is two-fold: either the event does not have what you think all of them does (non-conforming event data) OR your RegEx is slightly off and does not fully accommodate all variations of the events (insufficient RegEx). In either case, here is what you need to do to figure it out. Deploy the version that works best, let's say that you are using a field name of instance_id. Then run a search like this:

... NOT instance_id="*"

This will show you all events that do not have a field called instance_id. You adjust your RegEx or ignore that type of event (by putting an exclusion for it in your base search) and keep repeating this cycle until you have no events returned from that search.

0 Karma

diavolo
Path Finder

Mmm... After I changed the extracted field name in regex from instanceid to instance_id for workaround, it doesn't work for some events. It worked fine soon after I did change, but 1 hour later, it doesn't.

0 Karma

horsefez
Motivator

Could you provide us with the exakt event _raw payload that doesn't match this regex?

0 Karma

MuS
Legend

Hi diavolo,

my guess would be that in some events there is actually a field called instanceid.
Try to use a completely new/different field name to test your field extraction, something like this should work for you:

 \[instance:\s+(?<ThisIsMyTestFieldName>[^\]]+)

cheers, MuS

0 Karma

diavolo
Path Finder

Thanks MuS,
instanceid is not used anywhere. Changing field name like instance_id works fine. But I was wondering why...

0 Karma

DalJeanis
Legend

The problem may be the (?P at the beginning of the regex.

Also, I believe you can shorthand hex digits as \h, so your regex can look a bit cleaner if you try this -

 EXTRACT-fields_5 = \[instance:\s+(?<instanceid>\h{8}\-\h{4}\-\h{4}\-\h{4}\-\h{12})

see this page for more details - http://www.regular-expressions.info/refext.html

0 Karma

diavolo
Path Finder

? didn't fix the problem... Also, \h for hex didn't work.

0 Karma

DalJeanis
Legend

1) when you say "change the field name" are you talking about the underlying data, or the field name being extracted by the regex?
2) can you post an example of an event that the extract did NOT work for?

0 Karma

diavolo
Path Finder

1) The latter one. I changed regex from (?P<instanceid>...) to (?P<nstanceid>...). It worked.
2)
- Worked:
2017-01-06 03:08:35.416 21995 INFO nova.virt.libvirt.driver [-] [instance: 40624b9c-8179-4cb0-82ec-924ee5362cc0] Instance destroyed successfully.
- Not Worked:
2017-01-06 03:07:25.932 21995 DEBUG nova.network.neutronv2.api [-] [instance: 6708c71b-0f49-4b0b-8040-fec13e3e2a4b] get_instance_nw_info() _get_instance_nw_info /usr/lib/python2.7/site-packages/nova/network/neutronv2/api.py:602

0 Karma

horsefez
Motivator

Hi diavolo,

try the following.

(?:\[instance:\s+)(?P<instanceid>[0-9a-f]{8}\-[0-9a-f]{4}\-[0-9a-f]{4}\-[0-9a-f]{4}\-[0-9a-f]{12})(?:\])

Should work fine now.

0 Karma

diavolo
Path Finder

Unfortunately, it doesn't work. The field can't be extracted in some events.

0 Karma
Get Updates on the Splunk Community!

Introducing Splunk Enterprise 9.2

WATCH HERE! Watch this Tech Talk to learn about the latest features and enhancements shipped in the new Splunk ...

Adoption of RUM and APM at Splunk

    Unleash the power of Splunk Observability   Watch Now In this can't miss Tech Talk! The Splunk Growth ...

Routing logs with Splunk OTel Collector for Kubernetes

The Splunk Distribution of the OpenTelemetry (OTel) Collector is a product that provides a way to ingest ...