Splunk Search

Why did The extraction failed - error for Regex?




When I extract any fields from json log, following error is generated 

"The extraction failed. If you are extracting multiple fields, try removing one or more fields. Start with extractions that are embedded within longer text strings."


Aug 24 13:16:20 fenotify-333875.warning: { "alert": { "ack": "no", "action": "blocked", "alert-url": "***************", "appliance-id": "C4:00:AD:B6:C5:33", "attack-time": "2023-08-24T04:16:08Z", "dst": { "ip": "", "mac": "fc:34:97:03:fe:98", "port": "80" }, "explanation": { "analysis": "content", "cnc-services": { "cnc-service": { "address": "", "channel": "POST /album.php HTTP/1.1\r\nConnection: Keep-Alive\r\nAccept: text/html, application/xhtml+xml, */*\r\nAccept-Language: en-US\r\nUser-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; Trident/7.0; rv:11.0) like Gecko\r\nContent-Length: 273\r\nHost:\r\nPragma: no-cache\r\nCache-Control: no-cache\r\n\r\nc=jO0wkeKc25qk/jg9NkqHPYA1XRkb0eqAPErxNwK5fmcXnTY0m3qFMPT2&kaiikog=M4orW66CyB5IjuC7TFuXRXOu&uce=er+Z6Z0jmOjNDtX5cONg+rVQo6oNPYQ0leujF838&oa=JIcmHxXfQvOJUgRbe7md5RBz9uZx&ssqwy14=+gDzCdPBlfSipfJIxWZ/O6jp&mmmkii=Ejrq8elTUzQqMMrfBs2oCZkoqZFLbvdKd5YyiQgp50Qsaw+JBOzLVsxbAfJCDaY=", "host": "", "port": "80", "protocol": "tcp", "sid": "86134347", "sname": "Trojan.Bedep", "type": "CncSigMatch", "url": "hxxp://" } }, "malware-detected": { "malware": { "name": "Trojan.Bedep", "sid": "86134347", "stype": "bot-command" } }, "protocol": "tcp" }, "id": "333875", "interface": { "interface": "pether3", "label": "A1", "mode": "tap" }, "name": "malware-callback", "occurred": "2023-08-24T04:16:08Z", "product": "Web MPS", "root-infection": "7717", "sc-version": "1397.140", "sensor": "Coupers-NX", "sensor-ip": "", "severity": "crit", "src": { "ip": "", "mac": "00:0c:29:07:f9:d1", "port": "58061", "vlan": "0" }, "uuid": "62206b77-a649-4dfe-aba9-67debda3e52f", "version": "" }, "appliance": "Coupers-CM.couperscm.com", "appliance-id": "3C:EC:EF:8E:64:9E", "msg": "normal", "product": "CMS", "version": "" }

Labels (1)
0 Karma


Hi @hitong,

this seems to be a json file, in this case you don't need to use regexes to extract fields but you can use the "INDEXED_EXTRACTIONS = json" option in props.conf or the spath command (https://docs.splunk.com/Documentation/SCS/current/SearchReference/TextFunctions).

Anyway, you should use two regexes like the following:

| rex "\"action\":\s*\"(?<action>[^\"]*).*\"dst\":\s*\{\s*\"ip\":\s*\"(?<dst_ip>[^\"]*)\",\s*\"mac\":\s*\"(?<dst_mac>[^\"]*)\",\s*\"port\":\s*\"(?<dst_port>[^\"]*)"\s*}.*\s*\"src\":\s*\{\s*\"ip\":\s*\"(?<src_ip>[^\"]*)\",\s*\"mac\":\s*\"(?<src_mac>[^\"]*)\",\s*\"port\":\s*\"(?<src_port>[^\"]*)\",\s*\"vlan\":\s*\"(?<src_vlan>[^\"]*)\""
| rex "\"name\":\s*\"(?<malware_name>[^\"]*)"

that you can test at https://regex101.com/r/ftMcvw/1 and https://regex101.com/r/ftMcvw/2



0 Karma



actually there seems to be some non JSON part in the beginning of message. But you can take the JSON part into own field and then use it like

| rex "(?<json>\{.*\}$)"

Then you can use spath with this json field to pick wanted fields and values.

INDEXED_EXTRACTIONS = json is good option for pure json events, BUT you need to remember that this generates indexed fields and depending on contents and amount of those it could be good or bad on performance point of view. Another option is use KV_MODE=json on search phase, which didn't blow up your tsidx files. There are place for both of those (but never at same time or you would get duplicate values) or use some other ways too.

r. Ismo

0 Karma


1. The wizard has its limitations. It might be relatively easy to use but it usually doesn't produce optimal results. It's generally better to write regex extractions by hand.

2. This seems to be a json structure. Splunk can handle jsons pretty well but... it can't do so if the event as a whole is not a well-formed json. So you'd need to cut the header off and leave only the json part. Then it would be pretty easy to handle.

3. This is an event coming from FireEye appliance (I'm not sure if it's directly from a NX box or from a CM). There is an add-on on splunkbase but as far as I remember it's not very good. I'd consider reporting as CEF and parsing it on splunk with an add-on supporting CEF extractions.

0 Karma
Get Updates on the Splunk Community!

Enterprise Security Content Update (ESCU) | New Releases

In the last month, the Splunk Threat Research Team (STRT) has had 2 releases of new security content via the ...

Announcing the 1st Round Champion’s Tribute Winners of the Great Resilience Quest

We are happy to announce the 20 lucky questers who are selected to be the first round of Champion's Tribute ...

We’ve Got Education Validation!

Are you feeling it? All the career-boosting benefits of up-skilling with Splunk? It’s not just a feeling, it's ...