Not just Splunk. Python also obligatorily treat "\xHH" in double quotes as escape sequences and rejects this data as JSON. Like Splunk, it doesn't do this with "\n" if they are in input. I've no...
See more...
Not just Splunk. Python also obligatorily treat "\xHH" in double quotes as escape sequences and rejects this data as JSON. Like Splunk, it doesn't do this with "\n" if they are in input. I've no idea where those control characters (\n, \x etc.) are coming from. They are not in the data that the mainframe send to Splunk. Could you clarify the method you use to verify that \xHH are not in mainframe data? What do you use to inspect that data? Do you see newlines in places where "\n" shows in Splunk? As @ITWhisperer says, Splunk doesn't have the habit of inserting characters into ingested data. Meanwhile, mainframes use an IBM-specific character set (EBCDIC) internally. So, when it sends data out, something has to perform conversion. But most importantly, if you view data in mainframe terminal and do not see those characters, that's not proof that those are not in the data; even if you view data in an intermediary terminal emulator such as those on a Unix machine, those emulators can also interpret translated control characters according to IBM's definition. After all, control characters are used to control visual effect in terminals and by definition invisible to terminal users of the native platform, and a terminal emulator is expected to interpret converted control characters according to their native functions. My hypothesis is that those control characters are present in data stream sent from mainframe. The best solution is to either fix that on mainframe, or insert a pre-processor to escape/strip control characters. In the short term, instead of resorting to regex in a structured dataset, I recommend using regex to escape those control characters, then let Splunk's robust functions do its job. | fields _raw
| rex mode=sed "s/\\\\x/\\\\\\x/g"
| spath Using the sample data, my output is ACTION CONSOLE DATETIME JOBID JOBNAME MFSOURCETYPE MSGNUM MSGREQTYPE MSGTXT SYSLOGSYSTEMNAME SYSPLEX _raw INFORMATIONAL INTERNAL 2024-04-24 13:34:47.92 +0100 STC15694 RDSONLVP SYSLOG IEC147I IEC147I 613-04,IFG0195B,RDSONLVP,RDSONLVP,IIII4004,449F,JE5207, RDS.VPLS.PDLY0001.PFDRL.U142530.E240220\x9C \x80\x80 A090 UKPPLX01 {"MFSOURCETYPE":"SYSLOG","DATETIME":"2024-04-24 13:34:47.92 +0100","SYSLOGSYSTEMNAME":"A090","JOBID":"STC15694","JOBNAME":"RDSONLVP","SYSPLEX":"UKPPLX01","CONSOLE":"INTERNAL","ACTION":"INFORMATIONAL","MSGNUM":"IEC147I","MSGTXT":"IEC147I 613-04,IFG0195B,RDSONLVP,RDSONLVP,IIII4004,449F,JE5207,\nRDS.VPLS.PDLY0001.PFDRL.U142530.E240220\\x9C\n \\x80\\x80","MSGREQTYPE":""} (Note all "\xHH" sequences becomes "\\xHH" in _raw.) This is an emulation you can play with and compare with real data | makeresults
| eval _raw =
"{\"MFSOURCETYPE\":\"SYSLOG\",\"DATETIME\":\"2024-04-24 13:34:47.92 +0100\",\"SYSLOGSYSTEMNAME\":\"A090\",\"JOBID\":\"STC15694\",\"JOBNAME\":\"RDSONLVP\",\"SYSPLEX\":\"UKPPLX01\",\"CONSOLE\":\"INTERNAL\",\"ACTION\":\"INFORMATIONAL\",\"MSGNUM\":\"IEC147I\",\"MSGTXT\":\"IEC147I 613-04,IFG0195B,RDSONLVP,RDSONLVP,IIII4004,449F,JE5207,\\nRDS.VPLS.PDLY0001.PFDRL.U142530.E240220\\x9C\\n \\x80\\x80\",\"MSGREQTYPE\":\"\"} "
``` data emulation above ```