This is related to http://answers.splunk.com/questions/2141/xml-log-source-type
How would I remove line breaks found inside elements <Sql_Text>...</Sql_Text>
. Many of the SQL statements inside <Sql_Text>
tags have hard line breaks often at very undesirable places.
For example:
<Sql_Text>UPDATE APP_USR SET CONTACT_ID = 2 WHERE APP_
ID = 64
</Sql_Text>
I need it to look like this to make searching possible:
<Sql_Text>
UPDATE APP_USR SET CONTACT_ID = 2 WHERE APP_ID = 64
</Sql_Text>
or even this
<Sql_Text>UPDATE APP_USR SET CONTACT_ID = 2 WHERE APP_ID = 64</Sql_Text>
Thanks.
I don't think this is possible to do within a single regular expression. So you'll have to either:
Here is a search-time solution that would get reformat your text the way you are looking for:
| rex "(?s)^(?<_sqltxt_pre>.*?<Sql_Text>)(?<Sql_Text>.*?)(?<_sqltxt_post></Sql_Text>.*)$" | rex field=Sql_Text mode=sed "s/[\r\n]//g" | eval _raw=_sqltxt_pre . trim(Sql_Text) . _sqltxt_post
So for your example above, if you want to search for APP_ID 64, then you would need a search like so:
sourcetype=oracle_audit_xml | rex "(?s)^(?<_sqltxt_pre>.*?<Sql_Text>)(?<Sql_Text>.*?)(?<_sqltxt_post></Sql_Text>.*)$" | rex field=Sql_Text mode=sed "s/[\r\n]//g" | eval _raw=_sqltxt_pre . trim(Sql_Text) . _sqltxt_post | search "APP_ID = 64"
I don't think this is possible to do within a single regular expression. So you'll have to either:
Here is a search-time solution that would get reformat your text the way you are looking for:
| rex "(?s)^(?<_sqltxt_pre>.*?<Sql_Text>)(?<Sql_Text>.*?)(?<_sqltxt_post></Sql_Text>.*)$" | rex field=Sql_Text mode=sed "s/[\r\n]//g" | eval _raw=_sqltxt_pre . trim(Sql_Text) . _sqltxt_post
So for your example above, if you want to search for APP_ID 64, then you would need a search like so:
sourcetype=oracle_audit_xml | rex "(?s)^(?<_sqltxt_pre>.*?<Sql_Text>)(?<Sql_Text>.*?)(?<_sqltxt_post></Sql_Text>.*)$" | rex field=Sql_Text mode=sed "s/[\r\n]//g" | eval _raw=_sqltxt_pre . trim(Sql_Text) . _sqltxt_post | search "APP_ID = 64"
(?s)
tells the regular expression engine that .
(dot) can match anything. (Without this the expression .*
will not match across the end of a line, which we need in this case since your event spans multiple lines.) Basically we are breaking each event into 3 pieces, modifying the middle (e.g. your "Sql_Text") and then joining the 3 pieces back into one big event. I thought I added a basic explanation, but would appearer that I forgot. 😉 Sorry about that. Best of luck! I recommend http://www.regular-expressions.info/ for better understanding regular expressions.
Thanks. The search-time solution appears to work.
[although I'm a little confused by the first rex expression 🙂 I think I follow most of it except the first group match (?s)]
Is there a constant-width wrapping going on here? The problem with completely stripping out end-of-line characters like this is that there could be times where there are legitimate line breaks in the actual SQL statement; so you could end up with values lines being mashed together that shouldn't be.