Solved: Field Extraction matching not working but works on...

koshyk · ‎07-11-2019

Strange problem but couldn't find the root cause. Just checking if anyone of you have come across similar?

Sample data

2019-07-11T12:26:40+00:00 ABC-94 someproduct: ^1001/0^20190711122640087^MYDB^USER260^Sign-on^MYUSER1
2019-07-11T12:26:41+00:00 ABC-94 someproduct: ^1002/0^20190711122641087^MYDB^USER260^Sign-off^MYUSER2

I've configured props.conf as

EXTRACT-myDB_fields = \s+\^(?<message_id>\d+)\/(?<subtype>0)\^(?<datetime>[^\^]*)\^(?<userid>[^\^]*)\^(?<runid>[^\^]*)\^(?<description>[^\^]*)\^(?<user_id>.*)$

I've tried it using props.conf & transforms.conf, still there is issue

#props.conf
REPORT-mydb_extract = mydb_extract_common, mydb_extract_specific  

#transforms.conf
[mydb_extract_common]
REGEX=^(?<mydb_syslog_metadata>[^\^]+)\s+\^(?<mydb_specific_fields>.+)$

[mydb_extract_specific ]
SOURCE_KEY=mydb_specific_fields
DELIMS = "^"
FIELDS = "message_type","datetime","userid","runid","description","user_id"

The extractions works perfectly in regex101 and Splunk GUI. Fields are shown perfectly
BUT...
When i query in SPL

index=* sourcetype=mycustom message_id=1001   => This fails. Fails on ALL fields not just message_id
index=* sourcetype=mycustom message_id=*1001*   => This is success when you put wildcard wrappers on it.

When I do a

index=* sourcetype=mycustom | stats count by message_id  => This works perfectly and yields 1001 and 1002 etc.

I'm not sure why the changes are done in props.conf it inserts some character between the fields? Is there something magical for ^ field extraction?

Also I've run a length of the string. This is same as the string. So not related to any hidden characters or spaces.

... | eval length_message_id=len(message_id)

If I do via makeresults, there is no issue. Something related to props/transforms?

woodcock · ‎07-11-2019

You have to tell the Search Head that these fields are not indexed values (they do not fall between to major/minor breakers) by adding this to fields.conf:

[message_id]
INDEXED_VALUE = false

See details here:
https://www.splunk.com/blog/2011/10/07/cannot-search-based-on-an-extracted-field.html

View solution in original post

woodcock · ‎07-11-2019

You have to tell the Search Head that these fields are not indexed values (they do not fall between to major/minor breakers) by adding this to fields.conf:

[message_id]
INDEXED_VALUE = false

See details here:
https://www.splunk.com/blog/2011/10/07/cannot-search-based-on-an-extracted-field.html

koshyk · ‎07-12-2019

Thanks again Gregg.

Just to add some flesh into above comment, it is all about Major and Minor breakers. The list of breakers are available in segmenters.conf and ^ is NOT part of it. This means, any fields extracted needs to have INDEXED_VALUE=false if it has to be specified in search.

tiagofbmm · ‎07-11-2019

I've been through that. Change your special chars to unicode like \x5C for \\ in the props extract and it may solve your problem there. The escaping works different between rex on SPL and a props inline extract

https://www.utf8-chartable.de/unicode-utf8-table.pl?unicodeinhtml=hex

oscar84x · ‎07-11-2019

Is this only happening with the message_id field? Does the number of digits for this value vary between events or logs? If not, you could try to specify \d{4} or \d{1,6} and see if it makes a difference. Just an idea.

koshyk · ‎07-11-2019

This happens on all fields unfortunately 😞

Field Extraction matching not working but works only with wildcard

Stay Connected: Your Guide to May Tech Talks, Office Hours, and Webinars!

They're back! Join the SplunkTrust and MVP at .conf24

Enterprise Security Content Update (ESCU) | New Releases