<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Escaping the double-quotes when ingesting data? in Splunk Search</title>
    <link>https://community.splunk.com/t5/Splunk-Search/Escaping-the-double-quotes-when-ingesting-data/m-p/512455#M169451</link>
    <description>&lt;P&gt;&lt;a href="https://community.splunk.com/t5/user/viewprofilepage/user-id/69123"&gt;@santosh_sshanbh&lt;/a&gt;&amp;nbsp;yup. doing the same thing. DBConnect should address this issue. Free text fields should be able to get ingested and indexed "as is" without breaking the indexing.&lt;/P&gt;</description>
    <pubDate>Tue, 04 Aug 2020 21:40:19 GMT</pubDate>
    <dc:creator>nbarbato</dc:creator>
    <dc:date>2020-08-04T21:40:19Z</dc:date>
    <item>
      <title>Escaping the double-quotes when ingesting data?</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Escaping-the-double-quotes-when-ingesting-data/m-p/397552#M169444</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;

&lt;P&gt;I'm attempting to deal with data coming from a query run by the Splunk DB Connector. It pulls all the data in fine but one part of it's ingestion is really getting me: One of the fields is just a XML document which contains double quotes " marks. What Splunk seems to be doing is it only captures the data in that field up to the first double-quote and then decides it's the end of the sentence. I'd like to be able to get the whole XML as it is?&lt;/P&gt;

&lt;P&gt;For example:&lt;BR /&gt;
XML data:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;&amp;lt;ShowPlanXML xmlns="http://schemas.example.com" Version="1.2" Build="11.0.6615.2"&amp;gt;&amp;lt;BatchSequence&amp;gt;&amp;lt;Batch&amp;gt;&amp;lt;Statements&amp;gt;Much much much more content here&amp;lt;/Statements&amp;gt;&amp;lt;/Batch&amp;gt;&amp;lt;/BatchSequence&amp;gt;&amp;lt;/ShowPlanXML&amp;gt;
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;However when this makes it to Splunk it has a field name but it stops after the equals sympbol:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;query_plan = &amp;lt;ShowPlanXML xmlns=
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;I know all the data is in Splunk as I can see it, it's just not being properly captures into the field.&lt;/P&gt;

&lt;P&gt;I assume the solution is to tell Splunk to escape the double-quotes as the data is coming in. But where/how do I do this?&lt;/P&gt;

&lt;P&gt;Thanks&lt;BR /&gt;
Eddie&lt;/P&gt;</description>
      <pubDate>Mon, 14 May 2018 05:05:24 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Escaping-the-double-quotes-when-ingesting-data/m-p/397552#M169444</guid>
      <dc:creator>marrette</dc:creator>
      <dc:date>2018-05-14T05:05:24Z</dc:date>
    </item>
    <item>
      <title>Re: Escaping the double-quotes when ingesting data?</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Escaping-the-double-quotes-when-ingesting-data/m-p/397553#M169445</link>
      <description>&lt;P&gt;have you tried using xpath? &lt;A href="http://docs.splunk.com/Documentation/Splunk/7.1.0/SearchReference/Xpath"&gt;http://docs.splunk.com/Documentation/Splunk/7.1.0/SearchReference/Xpath&lt;/A&gt; &lt;BR /&gt;
It would be a search-time extraction, but i think it might do the trick.&lt;/P&gt;</description>
      <pubDate>Mon, 14 May 2018 12:15:46 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Escaping-the-double-quotes-when-ingesting-data/m-p/397553#M169445</guid>
      <dc:creator>cmerriman</dc:creator>
      <dc:date>2018-05-14T12:15:46Z</dc:date>
    </item>
    <item>
      <title>Re: Escaping the double-quotes when ingesting data?</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Escaping-the-double-quotes-when-ingesting-data/m-p/397554#M169446</link>
      <description>&lt;P&gt;I don't think that's exactly what I want, or at least I can't get it to work. I say this because the data comes into Splunk as a series of field names and values enclosed in double-quotes and separated by a comma:&lt;BR /&gt;
fieldname="value", (fieldname &lt;EM&gt;equal&lt;/EM&gt; &lt;EM&gt;double-quote&lt;/EM&gt; value &lt;EM&gt;double-quote&lt;/EM&gt; &lt;EM&gt;comma&lt;/EM&gt;) &lt;/P&gt;

&lt;P&gt;This means the XML part is just a large value (enclosed by quotes) with a field name of query_plan however Splunk has ended that field early because it thinks it's got all when it see's the first double quote in the XML.&lt;/P&gt;</description>
      <pubDate>Mon, 14 May 2018 23:07:51 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Escaping-the-double-quotes-when-ingesting-data/m-p/397554#M169446</guid>
      <dc:creator>marrette</dc:creator>
      <dc:date>2018-05-14T23:07:51Z</dc:date>
    </item>
    <item>
      <title>Re: Escaping the double-quotes when ingesting data?</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Escaping-the-double-quotes-when-ingesting-data/m-p/397555#M169447</link>
      <description>&lt;P&gt;Though further digging around I've discovered that running the SQL query directly from DB Connect's SQL Explorer it captures all the fields perfectly, including the query_plan field.&lt;/P&gt;

&lt;P&gt;So the incorrect matching of quotes must be happening after the data has passed though the DB Connect app and into Splunk proper. It appears I'm hitting a known thing (feature/bug) with DB Connect: &lt;A href="http://docs.splunk.com/Documentation/DBX/3.1.3/DeployDBX/Troubleshooting#Incomplete_field_values_are_extracted_when_the_value_contains_double_quotes" target="_blank"&gt;http://docs.splunk.com/Documentation/DBX/3.1.3/DeployDBX/Troubleshooting#Incomplete_field_values_are_extracted_when_the_value_contains_double_quotes&lt;/A&gt;&lt;/P&gt;

&lt;P&gt;It seems to recommend using an extract to fix this - but I'm stuck as to where to even begin with this as reading the Extract documentation doesn't seem to give any way of matching this field as a whole either.&lt;/P&gt;</description>
      <pubDate>Tue, 29 Sep 2020 19:32:18 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Escaping-the-double-quotes-when-ingesting-data/m-p/397555#M169447</guid>
      <dc:creator>marrette</dc:creator>
      <dc:date>2020-09-29T19:32:18Z</dc:date>
    </item>
    <item>
      <title>Re: Escaping the double-quotes when ingesting data?</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Escaping-the-double-quotes-when-ingesting-data/m-p/397556#M169448</link>
      <description>&lt;P&gt;Well after a bit more research I came up with a fix for this. Possibly this is basic 101 Splunk as if I'd known what a "search time field extraction" was before hand I would have probably fixed it immediately. Anyway here is my solution and the process I used to get to it.&lt;/P&gt;

&lt;P&gt;&lt;STRONG&gt;Step 1 - Work out how to match the data with a regex&lt;/STRONG&gt;&lt;/P&gt;

&lt;P&gt;As I knew the data was coming from the DB query but getting lost as it was being split into keys and values by Splunk due to quotes in the data. I took a look at the raw data to work out how to match it. To aid this write a Splunk search that brings up the data, and then pipe the _raw version of the data into a table to show exactly what the data looks like:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;index=myindex sourcetype=SQLEXTRACT source=mysource | table _raw
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;From this I could determine what regex would match the full value. In the raw data the field and value looked like this:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;query_plan="&amp;lt;ShowPlanXML xmlns="http://schemas.example.com" Version="1.2" Build="11.0.6615.2"&amp;gt;&amp;lt;BatchSequence&amp;gt;&amp;lt;Batch&amp;gt;&amp;lt;Statements&amp;gt;Much much much more content here&amp;lt;/Statements&amp;gt;&amp;lt;/Batch&amp;gt;&amp;lt;/BatchSequence&amp;gt;&amp;lt;/ShowPlanXML&amp;gt;"
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;Using the Splunk search interface and the rex command I was able to develop a regex to match the XML data in the value above. Here is an example of my finished and working search with regex:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;index=myindex sourcetype=SQLEXTRACT source=mysource | rex field=_raw "query_plan=\"(?&amp;lt;query_plan_xml&amp;gt;.+&amp;lt;\/ShowPlanXML&amp;gt;)" | table query_plan_xml
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;Note the use of a pipe and the table command after the rex command to get Splunk to display the value it had just captured in the regex.&lt;/P&gt;

&lt;P&gt;&lt;STRONG&gt;Step 2 - How to make this a Search Time Field Extraction&lt;/STRONG&gt;&lt;BR /&gt;
This second part took me ages to get working - it turned out the problem was this configuration needs to be made on the &lt;EM&gt;search head&lt;/EM&gt; - which in hindsight seems obvious because I want to do a &lt;EM&gt;search time&lt;/EM&gt; extraction.&lt;/P&gt;

&lt;P&gt;Simple extractions like this can be just added to the props.conf file on your search head (most probably in %splunk%/etc/apps/search/local). In the file called props.conf you add a new stanza (this is the part enclosed in square braces) named to match the sourcetype of your data - in this example 'mysource' and then you using the keyword 'EXTRACT-' to tell Splunk to use the regex to extract data. The syntax of this command is EXTRACT-&lt;EM&gt;classname&lt;/EM&gt; = &lt;EM&gt;regex expression to match&lt;/EM&gt;.&lt;/P&gt;

&lt;P&gt;So in the props.conf on my search head I added the following at the end of the file:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;[SQLEXTRACT]
EXTRACT-queryplanXML_class = query_plan=\"(?&amp;lt;query_plan&amp;gt;.+&amp;lt;\/ShowPlanXML&amp;gt;)
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;This line simply tells Splunk that when it is searching data with a sourcetype of 'SQLEXTRACT' it should run the following lines on it. The extract command then match's the XML data and returns it with a key name of 'query_plan' which overwrites the existing incorrectly matched data. I'm not really sure what the class name part does or where else it might be used - I just picked something unlikely to be used anywhere else.&lt;/P&gt;

&lt;P&gt;Once you have updated the props.conf file on your search head make sure to get Splunk to re-read this file. You can do this by restarting Splunk but because this is a search time configuration you can get Splunk to re-read the props.conf file by running the following search:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;| extract reload=t
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;This final command won't give you any results but that's okay because Splunk will reload the props.conf file, just re-run your original search and you should find the incorrectly matched search is now returning all the data as it should be!&lt;/P&gt;</description>
      <pubDate>Wed, 30 May 2018 05:28:28 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Escaping-the-double-quotes-when-ingesting-data/m-p/397556#M169448</guid>
      <dc:creator>marrette</dc:creator>
      <dc:date>2018-05-30T05:28:28Z</dc:date>
    </item>
    <item>
      <title>Re: Escaping the double-quotes when ingesting data?</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Escaping-the-double-quotes-when-ingesting-data/m-p/397557#M169449</link>
      <description>&lt;P&gt;I've often had to escape the double-quotes inside a field by adding escape characters in SQL. Ugly, but seems to work well.&lt;/P&gt;</description>
      <pubDate>Wed, 19 Sep 2018 14:01:57 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Escaping-the-double-quotes-when-ingesting-data/m-p/397557#M169449</guid>
      <dc:creator>grittonc</dc:creator>
      <dc:date>2018-09-19T14:01:57Z</dc:date>
    </item>
    <item>
      <title>Re: Escaping the double-quotes when ingesting data?</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Escaping-the-double-quotes-when-ingesting-data/m-p/397558#M169450</link>
      <description>&lt;P&gt;The workaround I used for this problem is by replacing the double quotes by null string using REPLACE function of SQL. The changes I have done in dbinput configuration only.&lt;/P&gt;</description>
      <pubDate>Fri, 14 Feb 2020 12:38:56 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Escaping-the-double-quotes-when-ingesting-data/m-p/397558#M169450</guid>
      <dc:creator>santosh_sshanbh</dc:creator>
      <dc:date>2020-02-14T12:38:56Z</dc:date>
    </item>
    <item>
      <title>Re: Escaping the double-quotes when ingesting data?</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Escaping-the-double-quotes-when-ingesting-data/m-p/512455#M169451</link>
      <description>&lt;P&gt;&lt;a href="https://community.splunk.com/t5/user/viewprofilepage/user-id/69123"&gt;@santosh_sshanbh&lt;/a&gt;&amp;nbsp;yup. doing the same thing. DBConnect should address this issue. Free text fields should be able to get ingested and indexed "as is" without breaking the indexing.&lt;/P&gt;</description>
      <pubDate>Tue, 04 Aug 2020 21:40:19 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Escaping-the-double-quotes-when-ingesting-data/m-p/512455#M169451</guid>
      <dc:creator>nbarbato</dc:creator>
      <dc:date>2020-08-04T21:40:19Z</dc:date>
    </item>
  </channel>
</rss>

