<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: XML ParseError with jobs.export with python SDK in Splunk Dev</title>
    <link>https://community.splunk.com/t5/Splunk-Dev/XML-ParseError-with-jobs-export-with-python-SDK/m-p/526079#M8727</link>
    <description>&lt;P&gt;Have you been able to root cause this issue? I have come across a similar one. When using Python SDK, jobs.export and BufferedReader (reader = results.ResultsReader(io.BufferedReader(search_results)), on some occasions I get the following exception:&lt;/P&gt;&lt;P&gt;Traceback (most recent call last):&lt;BR /&gt;File ".../splunk_event_editor.py", line 747, in search_and_modify&lt;BR /&gt;self._get_field_types_from_splunk(search_query, sampling=sampling, no_change_stop=2000)&lt;BR /&gt;File ".../ams/splunk_event_editor.py", line 434, in _get_field_types_from_splunk&lt;BR /&gt;for item in reader:&lt;BR /&gt;File ".../python3.7/site-packages/splunklib/results.py", line 210, in next&lt;BR /&gt;return next(self._gen)&lt;BR /&gt;File ".../python3.7/site-packages/splunklib/results.py", line 219, in _parse_results&lt;BR /&gt;for event, elem in et.iterparse(stream, events=('start', 'end')):&lt;BR /&gt;File ".../python3.7/xml/etree/ElementTree.py", line 1222, in iterator&lt;BR /&gt;yield from pullparser.read_events()&lt;BR /&gt;File ".../python3.7/xml/etree/ElementTree.py", line 1297, in read_events&lt;BR /&gt;raise event&lt;BR /&gt;File ".../python3.7/xml/etree/ElementTree.py", line 1269, in feed&lt;BR /&gt;self._parser.feed(data)&lt;BR /&gt;xml.etree.ElementTree.ParseError: &lt;FONT color="#FF0000"&gt;not well-formed (invalid token):&lt;/FONT&gt; line 51128, column 3080&lt;/P&gt;&lt;P&gt;The same code/query usually works a moment later so I suspect that it may have something to do with the fact that the new events matched by the search query might be arriving (via HTTP Event Collector) during the execution of the export.&lt;/P&gt;</description>
    <pubDate>Thu, 22 Oct 2020 18:19:01 GMT</pubDate>
    <dc:creator>mleati</dc:creator>
    <dc:date>2020-10-22T18:19:01Z</dc:date>
    <item>
      <title>XML ParseError with jobs.export with python SDK</title>
      <link>https://community.splunk.com/t5/Splunk-Dev/XML-ParseError-with-jobs-export-with-python-SDK/m-p/488178#M8726</link>
      <description>&lt;P&gt;In trying to speed up queries by using buffered export API in the python SDK as discussed &lt;A href="https://answers.splunk.com/answers/239348/how-can-i-get-the-splunk-python-sdk-api-to-return-1.html" target="_blank"&gt;here&lt;/A&gt;, running into a problem where the service &lt;CODE&gt;jobs.oneshot&lt;/CODE&gt; query works, while the &lt;CODE&gt;jobs.export&lt;/CODE&gt; version fails with &lt;CODE&gt;ParseError&lt;/CODE&gt;.&lt;/P&gt;
&lt;P&gt;The query parameters are otherwise the same.&lt;/P&gt;
&lt;P&gt;Moreover the location of the parse error varies each time the query is run (date/time parameters are fixed so it is presumably getting the same data each time).&lt;/P&gt;
&lt;P&gt;By looping through the generator, ie using &lt;CODE&gt;results.ResultsReader().next()&lt;/CODE&gt;, and trapping &lt;CODE&gt;Exceptions&lt;/CODE&gt; and returning python &lt;CODE&gt;type()&lt;/CODE&gt;, I see that the query seems to work for an initial segment of data&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;in other words, the returned OrderedDict objects are ok -&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;but that there is a XML parse errors that occurs at different records each time I run the query (recall the data extracted should be the same each time)&lt;/P&gt;
&lt;PRE&gt;&lt;CODE&gt;0 &amp;lt;class 'splunklib.results.Message'&amp;gt;
1 &amp;lt;class 'splunklib.results.Message'&amp;gt;
2 &amp;lt;class 'splunklib.results.Message'&amp;gt;
3 &amp;lt;class 'collections.OrderedDict'&amp;gt;
4 &amp;lt;class 'collections.OrderedDict'&amp;gt;
5 &amp;lt;class 'collections.OrderedDict'&amp;gt;
&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;...&lt;/P&gt;
&lt;PRE&gt;&lt;CODE&gt;4044 &amp;lt;class 'xml.etree.ElementTree.ParseError'&amp;gt;
&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;(rerunning this, the record # will vary)&lt;/P&gt;
&lt;P&gt;Here's a sample of the Traceback:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE&gt;Traceback (most recent call last):

  File "/Users/zk8n1ue/miniconda3/lib/python3.7/site-packages/IPython/core/interactiveshell.py", line 3267, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)

  File "&amp;lt;ipython-input-153-517a13bba7fa&amp;gt;", line 1, in &amp;lt;module&amp;gt;
    tmp3 = [parsefunc(x) for x in tq3['results']]

  File "&amp;lt;ipython-input-153-517a13bba7fa&amp;gt;", line 1, in &amp;lt;listcomp&amp;gt;
    tmp3 = [parsefunc(x) for x in tq3['results']]

  File "/Users/zk8n1ue/miniconda3/lib/python3.7/site-packages/splunklib/results.py", line 210, in next
    return next(self._gen)

  File "/Users/zk8n1ue/miniconda3/lib/python3.7/site-packages/splunklib/results.py", line 219, in _parse_results
    for event, elem in et.iterparse(stream, events=('start', 'end')):

  File "/Users/zk8n1ue/miniconda3/lib/python3.7/xml/etree/ElementTree.py", line 1222, in iterator
    yield from pullparser.read_events()

  File "/Users/zk8n1ue/miniconda3/lib/python3.7/xml/etree/ElementTree.py", line 1297, in read_events
    raise event

  File "/Users/zk8n1ue/miniconda3/lib/python3.7/xml/etree/ElementTree.py", line 1269, in feed
    self._parser.feed(data)

  File "&amp;lt;string&amp;gt;", line unknown
ParseError: not well-formed (invalid token): line 454841, column 54713
&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;Rerunning, yields the same trace except for the location of invalid token. eg&lt;/P&gt;
&lt;PRE&gt;&lt;CODE&gt;ParseError: not well-formed (invalid token): line 169194, column 13753

ParseError: not well-formed (invalid token): line 204476, column 30137
&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;Any ideas what could be causing this and is there a workaround?&lt;/P&gt;
&lt;P&gt;By manually&lt;/P&gt;</description>
      <pubDate>Sun, 07 Jun 2020 18:34:51 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Dev/XML-ParseError-with-jobs-export-with-python-SDK/m-p/488178#M8726</guid>
      <dc:creator>alancalvitti</dc:creator>
      <dc:date>2020-06-07T18:34:51Z</dc:date>
    </item>
    <item>
      <title>Re: XML ParseError with jobs.export with python SDK</title>
      <link>https://community.splunk.com/t5/Splunk-Dev/XML-ParseError-with-jobs-export-with-python-SDK/m-p/526079#M8727</link>
      <description>&lt;P&gt;Have you been able to root cause this issue? I have come across a similar one. When using Python SDK, jobs.export and BufferedReader (reader = results.ResultsReader(io.BufferedReader(search_results)), on some occasions I get the following exception:&lt;/P&gt;&lt;P&gt;Traceback (most recent call last):&lt;BR /&gt;File ".../splunk_event_editor.py", line 747, in search_and_modify&lt;BR /&gt;self._get_field_types_from_splunk(search_query, sampling=sampling, no_change_stop=2000)&lt;BR /&gt;File ".../ams/splunk_event_editor.py", line 434, in _get_field_types_from_splunk&lt;BR /&gt;for item in reader:&lt;BR /&gt;File ".../python3.7/site-packages/splunklib/results.py", line 210, in next&lt;BR /&gt;return next(self._gen)&lt;BR /&gt;File ".../python3.7/site-packages/splunklib/results.py", line 219, in _parse_results&lt;BR /&gt;for event, elem in et.iterparse(stream, events=('start', 'end')):&lt;BR /&gt;File ".../python3.7/xml/etree/ElementTree.py", line 1222, in iterator&lt;BR /&gt;yield from pullparser.read_events()&lt;BR /&gt;File ".../python3.7/xml/etree/ElementTree.py", line 1297, in read_events&lt;BR /&gt;raise event&lt;BR /&gt;File ".../python3.7/xml/etree/ElementTree.py", line 1269, in feed&lt;BR /&gt;self._parser.feed(data)&lt;BR /&gt;xml.etree.ElementTree.ParseError: &lt;FONT color="#FF0000"&gt;not well-formed (invalid token):&lt;/FONT&gt; line 51128, column 3080&lt;/P&gt;&lt;P&gt;The same code/query usually works a moment later so I suspect that it may have something to do with the fact that the new events matched by the search query might be arriving (via HTTP Event Collector) during the execution of the export.&lt;/P&gt;</description>
      <pubDate>Thu, 22 Oct 2020 18:19:01 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Dev/XML-ParseError-with-jobs-export-with-python-SDK/m-p/526079#M8727</guid>
      <dc:creator>mleati</dc:creator>
      <dc:date>2020-10-22T18:19:01Z</dc:date>
    </item>
  </channel>
</rss>

