Getting Data In

Removing line breaks within XML log data

cparham
Explorer

This is related to http://answers.splunk.com/questions/2141/xml-log-source-type

How would I remove line breaks found inside elements <Sql_Text>...</Sql_Text>. Many of the SQL statements inside <Sql_Text> tags have hard line breaks often at very undesirable places.

For example:

<Sql_Text>UPDATE APP_USR SET CONTACT_ID = 2 WHERE APP_  
ID = 64  
</Sql_Text>

I need it to look like this to make searching possible:

<Sql_Text>  
UPDATE APP_USR SET CONTACT_ID = 2 WHERE APP_ID = 64  
</Sql_Text>  

or even this

<Sql_Text>UPDATE APP_USR SET CONTACT_ID = 2 WHERE APP_ID = 64</Sql_Text>  

Thanks.

Tags (1)
1 Solution

Lowell
Super Champion

I don't think this is possible to do within a single regular expression. So you'll have to either:

  1. Fix this externally using a scripted input. (This is the most complicated approach and the only one that will yield correct index-time results.)
  2. Fix your data a search time using a sequence of commands (which could be consolidated using a macro for simplicity.

Here is a search-time solution that would get reformat your text the way you are looking for:

| rex "(?s)^(?<_sqltxt_pre>.*?<Sql_Text>)(?<Sql_Text>.*?)(?<_sqltxt_post></Sql_Text>.*)$" | rex field=Sql_Text mode=sed "s/[\r\n]//g" | eval _raw=_sqltxt_pre . trim(Sql_Text) . _sqltxt_post

So for your example above, if you want to search for APP_ID 64, then you would need a search like so:

sourcetype=oracle_audit_xml | rex "(?s)^(?<_sqltxt_pre>.*?<Sql_Text>)(?<Sql_Text>.*?)(?<_sqltxt_post></Sql_Text>.*)$" | rex field=Sql_Text mode=sed "s/[\r\n]//g" | eval _raw=_sqltxt_pre . trim(Sql_Text) . _sqltxt_post | search "APP_ID = 64"

View solution in original post

Lowell
Super Champion

I don't think this is possible to do within a single regular expression. So you'll have to either:

  1. Fix this externally using a scripted input. (This is the most complicated approach and the only one that will yield correct index-time results.)
  2. Fix your data a search time using a sequence of commands (which could be consolidated using a macro for simplicity.

Here is a search-time solution that would get reformat your text the way you are looking for:

| rex "(?s)^(?<_sqltxt_pre>.*?<Sql_Text>)(?<Sql_Text>.*?)(?<_sqltxt_post></Sql_Text>.*)$" | rex field=Sql_Text mode=sed "s/[\r\n]//g" | eval _raw=_sqltxt_pre . trim(Sql_Text) . _sqltxt_post

So for your example above, if you want to search for APP_ID 64, then you would need a search like so:

sourcetype=oracle_audit_xml | rex "(?s)^(?<_sqltxt_pre>.*?<Sql_Text>)(?<Sql_Text>.*?)(?<_sqltxt_post></Sql_Text>.*)$" | rex field=Sql_Text mode=sed "s/[\r\n]//g" | eval _raw=_sqltxt_pre . trim(Sql_Text) . _sqltxt_post | search "APP_ID = 64"

Lowell
Super Champion

(?s) tells the regular expression engine that . (dot) can match anything. (Without this the expression .* will not match across the end of a line, which we need in this case since your event spans multiple lines.) Basically we are breaking each event into 3 pieces, modifying the middle (e.g. your "Sql_Text") and then joining the 3 pieces back into one big event. I thought I added a basic explanation, but would appearer that I forgot. 😉 Sorry about that. Best of luck! I recommend http://www.regular-expressions.info/ for better understanding regular expressions.

0 Karma

cparham
Explorer

Thanks. The search-time solution appears to work.
[although I'm a little confused by the first rex expression 🙂 I think I follow most of it except the first group match (?s)]

0 Karma

Lowell
Super Champion

Is there a constant-width wrapping going on here? The problem with completely stripping out end-of-line characters like this is that there could be times where there are legitimate line breaks in the actual SQL statement; so you could end up with values lines being mashed together that shouldn't be.

0 Karma
Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

Why Splunk Customers Should Attend Cisco Live 2026 Las Vegas

Why Splunk Customers Should Attend Cisco Live 2026 Las Vegas     Cisco Live 2026 is almost here, and this ...

What Is the Name of the USB Key Inserted by Bob Smith? (BOTS Hint, Not the Answer)

Hello Splunkers,   So you searched, “what is the name of the usb key inserted by bob smith?”  Not gonna lie… ...

Automating Threat Operations and Threat Hunting with Recorded Future

    Automating Threat Operations and Threat Hunting with Recorded Future June 29, 2026 | Register   Is your ...