I'm in the process of setting up a new Splunk GovCloud instance, and I'm having no luck getting field extractions to work. We have an index that ingests JSON that includes a field in the following format, which is basically a comma separated set of values:
{
...
"customField":"s,TLS_CHACHA20_POLY1305_SHA256,0.e53c3217.1768417540.1550260,curl_D92CE15881831761FA790081ADA5975B,-,-,-,-,3%7e0480d07b4b8c1898",
...
}The sourcetype for this data is cloned from the _json sourcetype and it's parsing all the fields properly. I've created this regex that matches the above customField, and I've verified that it's working with a bunch of tests data via regex101.com:
(?<Network>[ps]),(?<tlsCipher>[A-Z][^,]+),(?<requestID>[0-9a-f\.]+),(?<BotID>[^,]+),.,(?<is_mobile>.),(?<is_tablet>.),(?<is_wireless>.),(?<tlsFingerprint>.+)The first thing I tried was to add an entry for my sourcetype that looks like this:
EXTRACT-cf1 = "(?<AkamaiNetwork>[ps]),(?<tlsCipher>[A-Z][^,]+),(?<requestID>[0-9a-f\.]+),(?<BotID>[^,]+),.,(?<is_mobile>.),(?<is_tablet>.),(?<is_wireless>.),(?<tlsFingerprint>.+)" in customFieldBut I never see these fields show up in my search results. I've tried both with and without quotes around the regular exprewssion. I know the regular expression is working because if I use rex to test it via | rex field=customField "<regex>" then it returns the fields. So what an I missing? Is there any way of debugging/troubleshooting this sort of issue short of a whole lot of trial and error? Would it make more sense to create a custom app that contains the sourcetype definition and a transform in transforms.conf to handle this?
Wait. Why would you even try to use regex to extract fields from json structure? That's what KV_MODE is for. Set it to json and you're good to go.
The JSON is defined by a third party and we have very little control over it. The third party does let us define the value for the field named "customField" and we pass a comma separated value into it. We want Splunk to parse that comma separated string into the individual fields.
OK. Makes sense (a bit). Unfortunately regex-based extraction takes place before KV_MODE so you can't extract your values from already extracted field. Which is kinda unfortunate since you're at the mercy of the sending side - you have to account for possible changes within the event formatting even perfectly within the json specifications (possible whitespaces here and there). And you have to manually unescape the string. Ugh.
BTW, does your props.conf entry have this "in customField" part? It shouldn't. As I wrote before - KV_MODE extractions take place after your regex-based extractions so your customField isn't defined yet when you're trying to extract your fields. You have to search through whole event (anchoring to the json field name).
Hi @bpenny
The issue here is that you have put " quotes around the regular expression in your EXTRACT statement, the quotes are not required.
Ive tested ingesting a local json file with the EXTRACT setting without the quotes and it extracts successfully. Please could you try updating to remove the quotes?
🌟 Did this answer help you? If so, please consider:
Your feedback encourages the volunteers in this community to continue contributing
Thanks for the suggestion, but as I mentioned in the original post I've already tried both with and without quotes around the regular expression. I've subsequently tried changing the regex to match against the raw JSON as a further test. regex101.com still shows the regex matching, but the fields are still not extracting for me.
Is there really no way short of trial and error to get this to work? No _internal logs to be looking for, or anything like that?