I experienced the following 3 issues when collecting Splunk data with Python splunk-sdk package.
The 1st issue is: during peak hours from 10AM to 4PM, I may experience the following error. How to increase concurrency_limit to avoid this error? Is concurrency_limit to be modified on Splunk server?
splunklib.binding.HTTPError: HTTP 503 Service Unavailable -- Search not executed: The maximum number of concurrent historical searches on this instance has been reached., concurrency_category="historical", concurrency_context="instance-wide", current_concurrency=52, concurrency_limit=52
The 3rd issue is about data duplicates. Below is my python program to collect 1 hour data from 11pm to midnight on 11/25 and load data into Pandas dataframe. It is sorted by _time and Index is the input order. As you can see in attached screenshot of CSV file, there are total 184 lines. But 88 lines are duplicates. Although I can use df.drop_duplicates() to drop duplicates, but it is not the best/most efficient way. I wonder if Splunk-sdk has an option to prevent such kind of duplicates?
The 2nd issue is the connection reset error when trying to collect whole day data in one Splunk connection. My work-around is to collect 1 hour data per each Splunk connection. It will be nice to resolve connection reset error so that I can collect whole day data in one session. Is it something to be modified on Splunk server or inside python splunk-sdk package?
ConnectionResetError: [WinError 10054] An existing connection was forcibly closed by the remote host.