Splunk Search

Field Extraction matching not working but works only with wildcard

koshyk
Super Champion

Strange problem but couldn't find the root cause. Just checking if anyone of you have come across similar?

Sample data

2019-07-11T12:26:40+00:00 ABC-94 someproduct: ^1001/0^20190711122640087^MYDB^USER260^Sign-on^MYUSER1
2019-07-11T12:26:41+00:00 ABC-94 someproduct: ^1002/0^20190711122641087^MYDB^USER260^Sign-off^MYUSER2

I've configured props.conf as

EXTRACT-myDB_fields = \s+\^(?<message_id>\d+)\/(?<subtype>0)\^(?<datetime>[^\^]*)\^(?<userid>[^\^]*)\^(?<runid>[^\^]*)\^(?<description>[^\^]*)\^(?<user_id>.*)$

I've tried it using props.conf & transforms.conf, still there is issue

#props.conf
REPORT-mydb_extract = mydb_extract_common, mydb_extract_specific  

#transforms.conf
[mydb_extract_common]
REGEX=^(?<mydb_syslog_metadata>[^\^]+)\s+\^(?<mydb_specific_fields>.+)$

[mydb_extract_specific ]
SOURCE_KEY=mydb_specific_fields
DELIMS = "^"
FIELDS = "message_type","datetime","userid","runid","description","user_id"

The extractions works perfectly in regex101 and Splunk GUI. Fields are shown perfectly
BUT...
When i query in SPL

index=* sourcetype=mycustom message_id=1001   => This fails. Fails on ALL fields not just message_id
index=* sourcetype=mycustom message_id=*1001*   => This is success when you put wildcard wrappers on it.

When I do a

index=* sourcetype=mycustom | stats count by message_id  => This works perfectly and yields 1001 and 1002 etc.

I'm not sure why the changes are done in props.conf it inserts some character between the fields? Is there something magical for ^ field extraction?

Also I've run a length of the string. This is same as the string. So not related to any hidden characters or spaces.

... | eval length_message_id=len(message_id)

If I do via makeresults, there is no issue. Something related to props/transforms?

0 Karma
1 Solution

woodcock
Esteemed Legend

You have to tell the Search Head that these fields are not indexed values (they do not fall between to major/minor breakers) by adding this to fields.conf:

[message_id]
INDEXED_VALUE = false

See details here:
https://www.splunk.com/blog/2011/10/07/cannot-search-based-on-an-extracted-field.html

View solution in original post

woodcock
Esteemed Legend

You have to tell the Search Head that these fields are not indexed values (they do not fall between to major/minor breakers) by adding this to fields.conf:

[message_id]
INDEXED_VALUE = false

See details here:
https://www.splunk.com/blog/2011/10/07/cannot-search-based-on-an-extracted-field.html

koshyk
Super Champion

Thanks again Gregg.

Just to add some flesh into above comment, it is all about Major and Minor breakers. The list of breakers are available in segmenters.conf and ^ is NOT part of it. This means, any fields extracted needs to have INDEXED_VALUE=false if it has to be specified in search.

0 Karma

tiagofbmm
Influencer

I've been through that. Change your special chars to unicode like \x5C for \\ in the props extract and it may solve your problem there. The escaping works different between rex on SPL and a props inline extract

https://www.utf8-chartable.de/unicode-utf8-table.pl?unicodeinhtml=hex

0 Karma

oscar84x
Contributor

Is this only happening with the message_id field? Does the number of digits for this value vary between events or logs? If not, you could try to specify \d{4} or \d{1,6} and see if it makes a difference. Just an idea.

0 Karma

koshyk
Super Champion

This happens on all fields unfortunately 😞

0 Karma
Get Updates on the Splunk Community!

Routing logs with Splunk OTel Collector for Kubernetes

The Splunk Distribution of the OpenTelemetry (OTel) Collector is a product that provides a way to ingest ...

Welcome to the Splunk Community!

(view in My Videos) We're so glad you're here! The Splunk Community is place to connect, learn, give back, and ...

Tech Talk | Elevating Digital Service Excellence: The Synergy of Splunk RUM & APM

Elevating Digital Service Excellence: The Synergy of Real User Monitoring and Application Performance ...