Hello,
I want to extract only the required text from Logs using rex.
for instance,
consider in logs there is some data in tags i.e
<ID> 100034566 </ID> <data> This consists of DB data </data> <date> the date is 04-03-2019 </data>..........etc
The regular expression which I am using is
index = * | rex field=Msg "<data>(?<error>.*)" | table error
The output which I am getting is
error
This consists of DB data </data> <date> the date is 04-03-2019 </date>..........etc
What I need is only the data which is present in tag . i.e
REQUIRED OUTPUT
error
This consists of DB data
But, The data which is suffix to that is also getting displayed, which I don't need.
Can anyone help me out in this?
Hi,
Please try below regex, that regex will extract output in new field called ext_data
<yourBaseSearch>
| rex field=_raw "\<data\>\s?(?<ext_data>[^\<]*)"
EDIT: Updated regex because I found space after <data>
This should probably work:
| rex field=Msg "\<data\>(?<error>[^<]+)"
https://regex101.com/r/tpYcTu/1
If your data indeed contains whitespace around the tags, you can strip that off using | eval data=trim(data)
after the rex command (can also be done by using a more complex regex).
Hi,
Please try below regex, that regex will extract output in new field called ext_data
<yourBaseSearch>
| rex field=_raw "\<data\>\s?(?<ext_data>[^\<]*)"
EDIT: Updated regex because I found space after <data>
Hi @harsmarvania57
Can you elaborate and explain the rex which you wrote?
Yes, I'll try my best to explain, from regex
\<data\>
is literally matching <data>
from your raw data\s?
will find white space after <data>
for zero or one time (?<ext_data>[^\<]*)
will find all character before <
and store that extracted data in new field called ext_data
Please post your current regex also as code (like you did with the sample data). Otherwise some special characters disappear.
Thanks for editing your question, the reason you're getting everything after the data tag, is because you use .*
, which matches anything. Have a look at the answers below for more strict regular expressions that stop at the <
character.