<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Python SDK - results.ResultsReader extremely slow in Splunk Dev</title>
    <link>https://community.splunk.com/t5/Splunk-Dev/Python-SDK-results-ResultsReader-extremely-slow/m-p/169447#M2246</link>
    <description>&lt;P&gt;See &lt;A href="https://github.com/splunk/splunk-sdk-python/pull/77"&gt;https://github.com/splunk/splunk-sdk-python/pull/77&lt;/A&gt;&lt;/P&gt;</description>
    <pubDate>Thu, 13 Mar 2014 13:25:20 GMT</pubDate>
    <dc:creator>richardhull_bjs</dc:creator>
    <dc:date>2014-03-13T13:25:20Z</dc:date>
    <item>
      <title>Python SDK - results.ResultsReader extremely slow</title>
      <link>https://community.splunk.com/t5/Splunk-Dev/Python-SDK-results-ResultsReader-extremely-slow/m-p/169440#M2239</link>
      <description>&lt;P&gt;I'm writing a search using the example from the SDK below. My search matches around 220,000 results and the search finishes in about 15 seconds, but it takes almost 5 minutes to loop over the results (tried with count as 10, 100, 1000 - doesn't seem to make a difference).&lt;/P&gt;
&lt;P&gt;Could there be anything causing the result processing to take so much time? I presume once the search finishes the server has all the data, so all it is doing is streaming the results - which I can't imagine is taxing for it?&lt;/P&gt;
&lt;PRE&gt;&lt;CODE&gt;import splunklib.results as results

# A blocking search
job = jobs.create("search my_search_string_here", **{"exec_mode": "blocking"})
print "...done!\n"

# Page through results by looping through sets of 10 at a time
print "Search results:\n"
resultCount = job["resultCount"]  # Number of results this job returned
offset = 0;                       # Start at result 0
count = 10;                       # Get sets of 10 results at a time

while (offset &amp;lt; int(resultCount)):
     kwargs_paginate = {"count": count,
                   "offset": offset}

     # Get the search results and display them
     blocksearch_results = job.results(**kwargs_paginate)

    for result in results.ResultsReader(blocksearch_results):
         print result

    # Increase the offset to get the next set of results
    offset += count
&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Sun, 07 Jun 2020 18:34:24 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Dev/Python-SDK-results-ResultsReader-extremely-slow/m-p/169440#M2239</guid>
      <dc:creator>Kindred</dc:creator>
      <dc:date>2020-06-07T18:34:24Z</dc:date>
    </item>
    <item>
      <title>Re: Python SDK - results.ResultsReader extremely slow</title>
      <link>https://community.splunk.com/t5/Splunk-Dev/Python-SDK-results-ResultsReader-extremely-slow/m-p/169441#M2240</link>
      <description>&lt;P&gt;I am experiencing the same thing. I ran my app with the &lt;EM&gt;-m cProfile&lt;/EM&gt; flag, and after some munging in excel:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;ncalls  tottime percall cumtime percall filename:lineno(function)
-----------------------------------------------------------------
410     0.01    0       94.422  0.23    results.py:204(next)
410     0.757   0.002   94.412  0.23    results.py:207(_parse_results)
29481   0.185   0       93.039  0.003   &amp;lt;string&amp;gt;:80(next)
33      0.001   0       92.819  2.813   results.py:93(read)
32      9.158   0.286   92.818  2.901   results.py:124(read)
518047  13.097  0       83.542  0       binding.py:1142(read)
518053  11.294  0       68.321  0       httplib.py:532(read)
518199  24.065  0       54.89   0       socket.py:336(read)
518764  9.899   0       19.695  0       ssl.py:235(recv)
518764  5.646   0       9.796   0       ssl.py:154(read)
518764  4.15    0       4.15    0       {built-in method read}
518520  2.431   0       2.431   0       {max}
518846  2.415   0       2.415   0       {method 'seek' of 'cStringIO.StringO' objects}
518466  2.356   0       2.356   0       {cStringIO.StringIO}
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;I'm reading this as &lt;STRONG&gt;results.py&lt;/STRONG&gt; is making 1/2million calls out to &lt;STRONG&gt;binding.py&lt;/STRONG&gt;'s read method, ONE character at a time. I'm guessing that it is not using any form of buffered I/O though ?&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;def read(self, n=None):
    """Read at most *n* characters from this stream.

    If *n* is ``None``, return all available characters.
    """
    response = ""
    while n is None or n &amp;gt; 0:
        c = self.stream.read(1)
        if c == "":
            break
        elif c == "&amp;lt;":
            c += self.stream.read(1)
            if c == "&amp;lt;?":
                while True:
                    q = self.stream.read(1)
                    if q == "&amp;gt;":
                        break
            else:
                response += c
                if n is not None:
                    n -= len(c)
        else:
            response += c
            if n is not None:
                n -= 1
    return response
&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Mon, 10 Mar 2014 12:16:47 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Dev/Python-SDK-results-ResultsReader-extremely-slow/m-p/169441#M2240</guid>
      <dc:creator>richardhull_bjs</dc:creator>
      <dc:date>2014-03-10T12:16:47Z</dc:date>
    </item>
    <item>
      <title>Re: Python SDK - results.ResultsReader extremely slow</title>
      <link>https://community.splunk.com/t5/Splunk-Dev/Python-SDK-results-ResultsReader-extremely-slow/m-p/169442#M2241</link>
      <description>&lt;P&gt;I should say, we're using the 1.1.0 python lib here&lt;/P&gt;</description>
      <pubDate>Mon, 10 Mar 2014 12:17:49 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Dev/Python-SDK-results-ResultsReader-extremely-slow/m-p/169442#M2241</guid>
      <dc:creator>richardhull_bjs</dc:creator>
      <dc:date>2014-03-10T12:17:49Z</dc:date>
    </item>
    <item>
      <title>Re: Python SDK - results.ResultsReader extremely slow</title>
      <link>https://community.splunk.com/t5/Splunk-Dev/Python-SDK-results-ResultsReader-extremely-slow/m-p/169443#M2242</link>
      <description>&lt;P&gt;I believe the issue is "caused" by the REST API rather than the SDK. The specific &lt;BR /&gt;
reason is roughly this: when search results are stored on disk as .csv.gz files&lt;BR /&gt;
(essentially, compressed CSVs), they are not seekable. &lt;/P&gt;

&lt;P&gt;So when you ask for offset 100K, for example, we will unpack the file until we find &lt;BR /&gt;
that offset, and then return 10/100/1000 results (however many you specified in count). &lt;BR /&gt;
When you then try and get offset 100010, we will expand it again, seek to that&lt;BR /&gt;
offset, and so forth. So as get into larger offsets, it will take longer and&lt;BR /&gt;
longer to do.&lt;/P&gt;

&lt;P&gt;To put it concisely: this specific API is not a good fit for exporting the entire&lt;BR /&gt;
result set. To do that, the best way is to use the /export API endpoint, for which&lt;BR /&gt;
there is an equivalent &lt;CODE&gt;export&lt;/CODE&gt; function in the Python SDK. This will stream the&lt;BR /&gt;
results to you as they become availabe, rather than you having to iterate over&lt;BR /&gt;
them through disk.&lt;/P&gt;

&lt;P&gt;We're working on an example for dev.splunk.com to show how to use &lt;CODE&gt;export&lt;/CODE&gt;, though it should be pretty similar to what you have above, just with a single &lt;CODE&gt;ResultsReader&lt;/CODE&gt;.&lt;/P&gt;</description>
      <pubDate>Tue, 11 Mar 2014 00:01:20 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Dev/Python-SDK-results-ResultsReader-extremely-slow/m-p/169443#M2242</guid>
      <dc:creator>ineeman</dc:creator>
      <dc:date>2014-03-11T00:01:20Z</dc:date>
    </item>
    <item>
      <title>Re: Python SDK - results.ResultsReader extremely slow</title>
      <link>https://community.splunk.com/t5/Splunk-Dev/Python-SDK-results-ResultsReader-extremely-slow/m-p/169444#M2243</link>
      <description>&lt;P&gt;My previous comment here wasn't quite accurate, so I removed it (since I can't edit it).&lt;/P&gt;</description>
      <pubDate>Tue, 11 Mar 2014 01:21:38 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Dev/Python-SDK-results-ResultsReader-extremely-slow/m-p/169444#M2243</guid>
      <dc:creator>ineeman</dc:creator>
      <dc:date>2014-03-11T01:21:38Z</dc:date>
    </item>
    <item>
      <title>Re: Python SDK - results.ResultsReader extremely slow</title>
      <link>https://community.splunk.com/t5/Splunk-Dev/Python-SDK-results-ResultsReader-extremely-slow/m-p/169445#M2244</link>
      <description>&lt;P&gt;Not buffering is definitely the problem here.&lt;/P&gt;

&lt;P&gt;I created the following class:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;class ResponseReaderWrapper(io.RawIOBase):

    def __init__(self, responseReader):
        self.responseReader = responseReader

    def readable(self):
        return True

    def close(self):
        self.responseReader.close()

    def read(self, n):
        return self.responseReader.read(n)

    def readinto(self, b):
        sz = len(b)
        data = self.responseReader.read(sz)
        for idx, ch in enumerate(data):
            b[idx] = ch

        return len(data)
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;And then this allows me to utilize the &lt;EM&gt;io.BufferedReader&lt;/EM&gt; as follows:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;rs = job.results(count=maxRecords, offset=self._offset)
results.ResultsReader(io.BufferedReader(ResponseReaderWrapper(rs)))
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;This means my query and pulling the results now runs in ~3 seconds rather than 90+ seconds as before.&lt;/P&gt;

&lt;P&gt;It would be nice if &lt;EM&gt;ResponseReader&lt;/EM&gt; implemented the &lt;CODE&gt;readable&lt;/CODE&gt; and &lt;CODE&gt;readinto&lt;/CODE&gt; methods so it were more streamlike, then this &lt;EM&gt;ResponseReaderWrapper&lt;/EM&gt; class wouldn't be necessary - happy to provide a pull-request for this if you agree. &lt;/P&gt;</description>
      <pubDate>Tue, 11 Mar 2014 12:23:45 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Dev/Python-SDK-results-ResultsReader-extremely-slow/m-p/169445#M2244</guid>
      <dc:creator>richardhull_bjs</dc:creator>
      <dc:date>2014-03-11T12:23:45Z</dc:date>
    </item>
    <item>
      <title>Re: Python SDK - results.ResultsReader extremely slow</title>
      <link>https://community.splunk.com/t5/Splunk-Dev/Python-SDK-results-ResultsReader-extremely-slow/m-p/169446#M2245</link>
      <description>&lt;P&gt;Richard,&lt;/P&gt;

&lt;P&gt;Thanks for investigating this issue offering to provide a pull-request. If you submit it, I will review the change.&lt;/P&gt;

&lt;P&gt;Best,
David Noble&lt;/P&gt;</description>
      <pubDate>Tue, 11 Mar 2014 17:31:51 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Dev/Python-SDK-results-ResultsReader-extremely-slow/m-p/169446#M2245</guid>
      <dc:creator>David_Noble_at_</dc:creator>
      <dc:date>2014-03-11T17:31:51Z</dc:date>
    </item>
    <item>
      <title>Re: Python SDK - results.ResultsReader extremely slow</title>
      <link>https://community.splunk.com/t5/Splunk-Dev/Python-SDK-results-ResultsReader-extremely-slow/m-p/169447#M2246</link>
      <description>&lt;P&gt;See &lt;A href="https://github.com/splunk/splunk-sdk-python/pull/77"&gt;https://github.com/splunk/splunk-sdk-python/pull/77&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 13 Mar 2014 13:25:20 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Dev/Python-SDK-results-ResultsReader-extremely-slow/m-p/169447#M2246</guid>
      <dc:creator>richardhull_bjs</dc:creator>
      <dc:date>2014-03-13T13:25:20Z</dc:date>
    </item>
    <item>
      <title>Re: Python SDK - results.ResultsReader extremely slow</title>
      <link>https://community.splunk.com/t5/Splunk-Dev/Python-SDK-results-ResultsReader-extremely-slow/m-p/169448#M2247</link>
      <description>&lt;P&gt;your solution sounds good.. &lt;BR /&gt;
do you have the example on dev.splunk.com already?&lt;/P&gt;</description>
      <pubDate>Tue, 25 Nov 2014 12:37:31 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Dev/Python-SDK-results-ResultsReader-extremely-slow/m-p/169448#M2247</guid>
      <dc:creator>mathu</dc:creator>
      <dc:date>2014-11-25T12:37:31Z</dc:date>
    </item>
    <item>
      <title>Re: Python SDK - results.ResultsReader extremely slow</title>
      <link>https://community.splunk.com/t5/Splunk-Dev/Python-SDK-results-ResultsReader-extremely-slow/m-p/169449#M2248</link>
      <description>&lt;P&gt;Phenominal. This was a great help! This improved export time 5x for me. Thank you.&lt;/P&gt;</description>
      <pubDate>Fri, 27 Feb 2015 02:04:39 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Dev/Python-SDK-results-ResultsReader-extremely-slow/m-p/169449#M2248</guid>
      <dc:creator>David</dc:creator>
      <dc:date>2015-02-27T02:04:39Z</dc:date>
    </item>
    <item>
      <title>Re: Python SDK - results.ResultsReader extremely slow</title>
      <link>https://community.splunk.com/t5/Splunk-Dev/Python-SDK-results-ResultsReader-extremely-slow/m-p/169450#M2249</link>
      <description>&lt;P&gt;I second @mathu's request. Are there any examples on using &lt;CODE&gt;export&lt;/CODE&gt;? Using the "buffered" solution in the accepted answer above, only gives me extremely slow read speeds (reading rate of rows/sec becomes slower the longer the query is -as expected based on the above explanation)&lt;/P&gt;</description>
      <pubDate>Tue, 26 May 2015 23:21:21 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Dev/Python-SDK-results-ResultsReader-extremely-slow/m-p/169450#M2249</guid>
      <dc:creator>nikos_d</dc:creator>
      <dc:date>2015-05-26T23:21:21Z</dc:date>
    </item>
    <item>
      <title>Re: Python SDK - results.ResultsReader extremely slow</title>
      <link>https://community.splunk.com/t5/Splunk-Dev/Python-SDK-results-ResultsReader-extremely-slow/m-p/169451#M2250</link>
      <description>&lt;P&gt;@ineeman, This is 6 years ago, is there an update link for export API?&lt;/P&gt;</description>
      <pubDate>Thu, 16 Jan 2020 15:29:56 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Dev/Python-SDK-results-ResultsReader-extremely-slow/m-p/169451#M2250</guid>
      <dc:creator>alancalvitti</dc:creator>
      <dc:date>2020-01-16T15:29:56Z</dc:date>
    </item>
    <item>
      <title>Re: Python SDK - results.ResultsReader extremely slow</title>
      <link>https://community.splunk.com/t5/Splunk-Dev/Python-SDK-results-ResultsReader-extremely-slow/m-p/169452#M2251</link>
      <description>&lt;P&gt;@ineeman, we're getting XML parse errors from jobs.export, where jobs.oneshot completes the same query. Is it possible to export either the csv.gz, json or python OrderedDict representation?&lt;/P&gt;</description>
      <pubDate>Fri, 17 Jan 2020 15:12:48 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Dev/Python-SDK-results-ResultsReader-extremely-slow/m-p/169452#M2251</guid>
      <dc:creator>alancalvitti</dc:creator>
      <dc:date>2020-01-17T15:12:48Z</dc:date>
    </item>
  </channel>
</rss>

