Splunk Dev

Search returns only 50000 events in Python script?

bdhin
New Member

Hi,

We are using the below python script to get the results from Splunk but the problem is that through UI we are getting more than 6lakh records. However, through API we are getting only 50000 records.

Please help - what do I need to add in below script to get all the records?

import urllib
import httplib2
import time
import re
from time import localtime,strftime
from xml.dom import minidom
import json
baseurl = 'https://localhost:8089'
username = ''
password = ''
myhttp = httplib2.Http()

#Step 1: Get a session key
servercontent = myhttp.request(baseurl + '/services/auth/login', 'POST',
                            headers={}, body=urllib.urlencode({'username':username, 'password':password}))[1]
sessionkey = minidom.parseString(servercontent).getElementsByTagName('sessionKey')[0].childNodes[0].nodeValue
print "====>sessionkey:  %s  <====" % sessionkey 

#Step 2: Create a search job    
searchquery = 'index="_internal" | head 10'
if not searchquery.startswith('search'):
searchquery = 'search ' + searchquery

searchjob = myhttp.request(baseurl + '/services/search/jobs','POST',
headers={'Authorization': 'Splunk %s' % sessionkey},body=urllib.urlencode({'search': searchquery}))[1]
sid = minidom.parseString(searchjob).getElementsByTagName('sid')[0].childNodes[0].nodeValue
print "====>sid:  %s  <====" % sid

#Step 3: Get the search status    
myhttp.add_credentials(username, password)
servicessearchstatusstr = '/services/search/jobs/%s/' % sid
isnotdone = True
while isnotdone:
    searchstatus = myhttp.request(baseurl + servicessearchstatusstr, 'GET')[1]
    isdonestatus = re.compile('isDone">(0|1)')
    isdonestatus = isdonestatus.search(searchstatus).groups()[0]
    if (isdonestatus == '1'):
        isnotdone = False
print "====>search status:  %s  <====" % isdonestatus

#Step 4: Get the search results
services_search_results_str = '/services/search/jobs/%s/results?output_mode=json&count=0' % sid
searchresults = myhttp.request(baseurl + services_search_results_str, 'GET')[1]
print "====>search result:  [%s]  <====" % searchresults

 

Labels (2)
0 Karma

codebuilder
SplunkTrust
SplunkTrust

You're hitting a default search limit. You can increase this value within limits.conf

[searchresults]
maxresultrows = 50000

And/or:

[restapi]
maxresultrows = 50000

You'll need to cycle Splunk after making the config change.

Generally speaking, when you see nice round numbers like 50000, then you're encountering a limitation/parameter within limits.conf

----
An upvote would be appreciated and Accept Solution if it helps!
0 Karma

bdhin
New Member

Hi @codebuilder

Thank you for sharing the answer,I was thinking to add the loop in my code to check the value for count and offset and based on that fetch the output.I am not sure how to implement that in my code.Can you please help me with that?

0 Karma

codebuilder
SplunkTrust
SplunkTrust

I think it would be easier and more reliable if you instead narrow your search. Either by excluding data or narrowing the date range. It will perform much faster, so you can iterate through a call to that search in order to retrieve all the results you are seeking. It will be more simple to read and maintain, and will perform much better.

----
An upvote would be appreciated and Accept Solution if it helps!
0 Karma

codebuilder
SplunkTrust
SplunkTrust

(forgot to mention)
Also consider using a accelerated datamodel, your scenario sounds like a perfect candidate.

----
An upvote would be appreciated and Accept Solution if it helps!
0 Karma

harsmarvania57
SplunkTrust
SplunkTrust

Hi,

For large dataset export, please use jobs/export endpoint https://docs.splunk.com/Documentation/Splunk/7.2.6/RESTREF/RESTsearch#search.2Fjobs.2Fexport

0 Karma

bdhin
New Member

Hi @harsmarvania57

Can you please help me how to implement it in the above code?

I am new to this one any help would be much appreciated.

0 Karma

harsmarvania57
SplunkTrust
SplunkTrust
0 Karma

bdhin
New Member

@harsmarvania57 Thanks for sharing the link.I was thinking to add the loop in my mentioned code to take count as 50000 and offset as 0 then count as 50000 and offset as 50000 and so on....I am not sure how to add this loop in my code.Can you please help me with that?

0 Karma

harsmarvania57
SplunkTrust
SplunkTrust

You'll not able to achieve this using loop because results endpoint return only 50000 events. If you want to achieve this using export endpoint with Splunk Python SDK then let me know and I'll provide script.

0 Karma

bdhin
New Member

Hi @harsmarvania57 sure please share the script that would be great

0 Karma

harsmarvania57
SplunkTrust
SplunkTrust

Try below query (change query , time range , IP based on your requirement) and you need to download Splunk Python SDK to run this script

import sys
import getpass
import json
sys.path.append('splunk-sdk-python-1.6.4')
import splunklib.client as client
import splunklib.results as results

splunkUser = raw_input("Enter Splunk Username: ")
splunkPassword = getpass.getpass("Enter Splunk Password: ")

splunkService = client.connect(host='<IP>', port=8089, username=splunkUser, password=splunkPassword, verify=0)
kwargs_export = {"earliest_time": "-15m", "latest_time": "now", "search_mode": "normal"}
job = splunkService.jobs.export("search index=_internal | stats count by host,sourcetype", **kwargs_export)

rr = results.ResultsReader(job)
f = open('results.txt', 'w')

for result in rr:
    if isinstance(result, dict):
        a = json.dumps(dict(result))
        f.write(a)
assert rr.is_preview == False
f.close()
0 Karma

kmorris_splunk
Splunk Employee
Splunk Employee
0 Karma

bdhin
New Member

@kmorris_splunk Yes,I tried but its not working

0 Karma

martynoconnor
Communicator

Since applying that change, have you restarted the Splunk instance?

0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...