Hi
I need to extract multivalue field from an event structured in xml.
<job>
<nameJob>Job1</nameJob>
<executionJob>
0 file(s) AND 0 folder(s)
==========================================
10 file(s) AND 0 folder(s)
</executionJob>
</job>
I can not use xpath and KV_MODE=xml because some events have special characters which prevents the parsing.
I am trying to use regular expression with the command rex for example extract the steps data
The regular expression "<steps>(?<abc>((?!<\/steps>)[\s\S])*)<\/steps>" works well in a regular expression tester tool (pcre) but when I am trying to do the same with Splunk with the following command:
"basic search | rex field=_raw "<steps>(?<abc>((?!</steps>)[\s\S])*)<\/steps>" max_match=999"
I am getting the error message :
"Streamed search execute failed because: Error in 'rex' command: Regex match error, please check log"
Do you have any idea what is going wrong?
Thanks.
J.
The regex string is missing an escape character. Try
\<steps>(?<abc>((?!<\/steps>)[\s\S])*)<\/steps>
That said, unless you need the entire XML indexed, you should consider using a scripted input to parse the XML and extract only the needed fields for indexing. A python parser will be much easier to write and will save license and storage costs by reducing the XML verbosity.
Hi
I am still get the error in the rex command (it does not like [\s\S]).
I will probably follow your recommendation and implement an python parser to extract the information.
Thanks
Can you list the values you are trying to extract for 'steps' from the event you posted?
Hi Luke
The xml format was not rendered correctly in my question, the xml structure per events is :
<job>
<nameJob>Job1</nameJob>
<executionJob>
<started>
<dateS>2016-10-03</dateS>
<timeS>12:31:25</timeS>
<serverS>ServerX</serverS>
<userS>JobUser</userS>
</started>
<steps>
<nameStep>Step 1</nameStep>
<descrStep>Clean directories A </descrStep>
<beginTimeStep>2016/10/03 12:31:25</beginTimeStep>
<comStep>
<comment>Starting</comment>
</comStep>
<comStep>
<comment>Execution....</comment>
</comStep>
<comStep>
<comment>0 file(s) AND 0 folder(s)</comment>
</comStep>
<rcStep>rc=0</rcStep>
<endTimeStep>2016/10/03 12:31:26</endTimeStep>
</steps>
<steps>
<nameStep>Step 2</nameStep>
<descrStep>Clean Directories B</descrStep>
<beginTimeStep>2016/10/03 12:31:26</beginTimeStep>
<comStep>
<comment>Starting</comment>
</comStep>
<comStep>
<comment>Execution....</comment>
</comStep>
<comStep>
<comment>10 file(s) AND 0 folder(s)</comment>
</comStep>
<rcStep>rc=0</rcStep>
<endTimeStep>2016/10/03 12:31:27</endTimeStep>
</steps>
<ended>
<dateE>2016-10-03</dateE>
<timeE>12:31:27</timeE>
<globalRcE>grc=0</globalRcE>
</ended>
</executionJob>
</job>
So for 1 job event, I have several steps having a set of properties. And some of these properties, there is also multivalue like which provide the output of each step
My final objective is to get something like
nameJob nameStep descrStep beginTimeStep comment
Job1 Step 1 Clean directories A 2016/10/03 12:31:25 Starting
Execution...
0 file(s) AND 0 folder(s)
Job1 Step 2 Clean Directories B 2016/10/03 12:31:26 Starting
Execution...
10 file(s) AND 0 folder(s)