Using Python in Jupyter notebooks to run Splunk API. The queries run fine from both Python and Splunk itself. However, when running in Python, my status messages show completeness over 100%, sometimes as high as 49434736695157.4% , which shouldn't be mathematically possible. Again, the actual stats end up being correct. All the code was not originally written by me, but I've been tasked with trying to find the flaw. I'll add the non-query code below where I believe the problem lies. I'm brand new to Splunk and I'm not bad at Python, so it is a little tricky for me to see the issue. Appreciate any help on this. Thanks!
# Execute search in export mode and return results as dataframe
def export_to_dataframe(_service, _query):
df = pd.DataFrame() # create empty dataframe to store results
job = _service.jobs.create(_query) # execute search query in export mode
while True:
while not job.is_ready():
pass
stats = {
"isDone": job["isDone"],
"doneProgress": float(job["doneProgress"])*100,
"scanCount": int(job["scanCount"]),
"eventCount": int(job["eventCount"]),
"resultCount": int(job["resultCount"])
}
status = ("\r%(doneProgress)03.1f%% %(scanCount)d scanned "
"%(eventCount)d matched %(resultCount)d results") % stats
sys.stdout.write(status)
sys.stdout.flush()
if stats["isDone"] == "1":
sys.stdout.write("\n\nDone!\n\n")
break
sleep(2)
# wait for results
jobResults = results.ResultsReader(job.results()) # read job results
for result in jobResults:
if isinstance(result, dict):
df = df.append(result, result.keys()) # append message to df
return df
# Execute search in normal mode (non-blocking) and return results as dataframe
def normal_to_dataframe(_service, _query):
from time import sleep
df = pd.DataFrame() # create empty dataframe to store results
job = _service.jobs.create(_query) # execute search query in normal mode
while not job.is_done(): # Poll for Splunk search job completion
sleep(1)
jobResults = results.ResultsReader(job.results(count=0)) # read job results
for result in jobResults:
if isinstance(result, dict):
df = df.append(result, result.keys()) # append message to df
return df