Splunk Dev

XML ParseError with jobs.export with python SDK

alancalvitti
Path Finder

In trying to speed up queries by using buffered export API in the python SDK as discussed here, running into a problem where the service jobs.oneshot query works, while the jobs.export version fails with ParseError.

The query parameters are otherwise the same.

Moreover the location of the parse error varies each time the query is run (date/time parameters are fixed so it is presumably getting the same data each time).

By looping through the generator, ie using results.ResultsReader().next(), and trapping Exceptions and returning python type(), I see that the query seems to work for an initial segment of data

  • in other words, the returned OrderedDict objects are ok -

but that there is a XML parse errors that occurs at different records each time I run the query (recall the data extracted should be the same each time)

0 <class 'splunklib.results.Message'>
1 <class 'splunklib.results.Message'>
2 <class 'splunklib.results.Message'>
3 <class 'collections.OrderedDict'>
4 <class 'collections.OrderedDict'>
5 <class 'collections.OrderedDict'>

...

4044 <class 'xml.etree.ElementTree.ParseError'>

(rerunning this, the record # will vary)

Here's a sample of the Traceback:

Traceback (most recent call last):

  File "/Users/zk8n1ue/miniconda3/lib/python3.7/site-packages/IPython/core/interactiveshell.py", line 3267, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)

  File "<ipython-input-153-517a13bba7fa>", line 1, in <module>
    tmp3 = [parsefunc(x) for x in tq3['results']]

  File "<ipython-input-153-517a13bba7fa>", line 1, in <listcomp>
    tmp3 = [parsefunc(x) for x in tq3['results']]

  File "/Users/zk8n1ue/miniconda3/lib/python3.7/site-packages/splunklib/results.py", line 210, in next
    return next(self._gen)

  File "/Users/zk8n1ue/miniconda3/lib/python3.7/site-packages/splunklib/results.py", line 219, in _parse_results
    for event, elem in et.iterparse(stream, events=('start', 'end')):

  File "/Users/zk8n1ue/miniconda3/lib/python3.7/xml/etree/ElementTree.py", line 1222, in iterator
    yield from pullparser.read_events()

  File "/Users/zk8n1ue/miniconda3/lib/python3.7/xml/etree/ElementTree.py", line 1297, in read_events
    raise event

  File "/Users/zk8n1ue/miniconda3/lib/python3.7/xml/etree/ElementTree.py", line 1269, in feed
    self._parser.feed(data)

  File "<string>", line unknown
ParseError: not well-formed (invalid token): line 454841, column 54713

Rerunning, yields the same trace except for the location of invalid token. eg

ParseError: not well-formed (invalid token): line 169194, column 13753

ParseError: not well-formed (invalid token): line 204476, column 30137

Any ideas what could be causing this and is there a workaround?

By manually

Labels (2)
0 Karma

mleati
Explorer

Have you been able to root cause this issue? I have come across a similar one. When using Python SDK, jobs.export and BufferedReader (reader = results.ResultsReader(io.BufferedReader(search_results)), on some occasions I get the following exception:

Traceback (most recent call last):
File ".../splunk_event_editor.py", line 747, in search_and_modify
self._get_field_types_from_splunk(search_query, sampling=sampling, no_change_stop=2000)
File ".../ams/splunk_event_editor.py", line 434, in _get_field_types_from_splunk
for item in reader:
File ".../python3.7/site-packages/splunklib/results.py", line 210, in next
return next(self._gen)
File ".../python3.7/site-packages/splunklib/results.py", line 219, in _parse_results
for event, elem in et.iterparse(stream, events=('start', 'end')):
File ".../python3.7/xml/etree/ElementTree.py", line 1222, in iterator
yield from pullparser.read_events()
File ".../python3.7/xml/etree/ElementTree.py", line 1297, in read_events
raise event
File ".../python3.7/xml/etree/ElementTree.py", line 1269, in feed
self._parser.feed(data)
xml.etree.ElementTree.ParseError: not well-formed (invalid token): line 51128, column 3080

The same code/query usually works a moment later so I suspect that it may have something to do with the fact that the new events matched by the search query might be arriving (via HTTP Event Collector) during the execution of the export.

0 Karma
Get Updates on the Splunk Community!

Splunk Observability Cloud | Unified Identity - Now Available for Existing Splunk ...

Raise your hand if you’ve already forgotten your username or password when logging into an account. (We can’t ...

Index This | How many sides does a circle have?

February 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

Registration for Splunk University is Now Open!

Are you ready for an adventure in learning?   Brace yourselves because Splunk University is back, and it's ...