Strange problem but couldn't find the root cause. Just checking if anyone of you have come across similar?
Sample data
2019-07-11T12:26:40+00:00 ABC-94 someproduct: ^1001/0^20190711122640087^MYDB^USER260^Sign-on^MYUSER1
2019-07-11T12:26:41+00:00 ABC-94 someproduct: ^1002/0^20190711122641087^MYDB^USER260^Sign-off^MYUSER2
I've configured props.conf as
EXTRACT-myDB_fields = \s+\^(?<message_id>\d+)\/(?<subtype>0)\^(?<datetime>[^\^]*)\^(?<userid>[^\^]*)\^(?<runid>[^\^]*)\^(?<description>[^\^]*)\^(?<user_id>.*)$
I've tried it using props.conf & transforms.conf, still there is issue
#props.conf
REPORT-mydb_extract = mydb_extract_common, mydb_extract_specific
#transforms.conf
[mydb_extract_common]
REGEX=^(?<mydb_syslog_metadata>[^\^]+)\s+\^(?<mydb_specific_fields>.+)$
[mydb_extract_specific ]
SOURCE_KEY=mydb_specific_fields
DELIMS = "^"
FIELDS = "message_type","datetime","userid","runid","description","user_id"
The extractions works perfectly in regex101 and Splunk GUI. Fields are shown perfectly
BUT...
When i query in SPL
index=* sourcetype=mycustom message_id=1001 => This fails. Fails on ALL fields not just message_id
index=* sourcetype=mycustom message_id=*1001* => This is success when you put wildcard wrappers on it.
When I do a
index=* sourcetype=mycustom | stats count by message_id => This works perfectly and yields 1001 and 1002 etc.
I'm not sure why the changes are done in props.conf it inserts some character between the fields? Is there something magical for ^
field extraction?
Also I've run a length of the string. This is same as the string. So not related to any hidden characters or spaces.
... | eval length_message_id=len(message_id)
If I do via makeresults, there is no issue. Something related to props/transforms?
You have to tell the Search Head that these fields are not indexed values (they do not fall between to major/minor breakers) by adding this to fields.conf
:
[message_id]
INDEXED_VALUE = false
See details here:
https://www.splunk.com/blog/2011/10/07/cannot-search-based-on-an-extracted-field.html
You have to tell the Search Head that these fields are not indexed values (they do not fall between to major/minor breakers) by adding this to fields.conf
:
[message_id]
INDEXED_VALUE = false
See details here:
https://www.splunk.com/blog/2011/10/07/cannot-search-based-on-an-extracted-field.html
Thanks again Gregg.
Just to add some flesh into above comment, it is all about Major and Minor breakers. The list of breakers are available in segmenters.conf and ^
is NOT part of it. This means, any fields extracted needs to have INDEXED_VALUE=false
if it has to be specified in search.
I've been through that. Change your special chars to unicode like \x5C
for \\
in the props extract and it may solve your problem there. The escaping works different between rex on SPL and a props inline extract
https://www.utf8-chartable.de/unicode-utf8-table.pl?unicodeinhtml=hex
Is this only happening with the message_id field? Does the number of digits for this value vary between events or logs? If not, you could try to specify \d{4} or \d{1,6} and see if it makes a difference. Just an idea.
This happens on all fields unfortunately 😞