TL;DR - python script (ldapsearch) stops after 15 minutes. Didn't happen after upgrading the add-on. Happened after upgrading Splunk.
So, a few days before I upgraded Splunk from 6.3.1 to 6.5.1, I upgraded Splunk Support for Active Directory (SA-ldapsearch) add-on to 2.1.4. We have nightly jobs that search our directories for metadata and output to CSV. Historically, some of these searches run well over an hour. Upgrading to 2.1.4 didn't break these longer searches, but upgrading Splunk to 6.5.1 did. Here are some interesting points I have noticed:
If I search _audit for an affected search_id, at 15 minutes into the search, there is an event, but it's not from the user who owns the search, it's for splunk-system-user. Before upgrading to 6.5.1, this event does not occur.
Audit:[timestamp=01-06-2017 22:15:08.803, id=16356, user=splunk-system-user, action=search, info=granted REST: /search/jobs/scheduler_amg5NjMyOS1kcw_U0EtbGRhcHNlYXJjaA__RMD5382a042170949a29_at_1483758000_696][ctVE9bBqZ5wadW/RBmx70tVR3GbFX+my52Itx5qin3z9Lg0Kwn3fgFJoJBXGwiE3lKSDJyHa8VuFalijSW2MqDRCoNJOyA+gm1orBvAwKhUaLGS/s0eoQfPOwLThOMUJwmYyNQndkIE9l5M1rZPmjkxGtJLKW71Zdyb7FUGGU8Y=]
It's almost as if there is a new limit which was introduced by the upgrade, but I'm having trouble tracking down what limit this might be related to. As expected, there aren't any local or default limits.conf in the add-on. If I add filters to make the ldapsearch run faster (less than 15 minutes), the search works exactly like I would expect it to.
Hi, I've found the solution to my problem. I hope it will also solve yours. Solution is to set in the [search] stanza of limits.conf (in etc/system or etc/apps/) the batch_wait_after_end parameter to a value higher than the longest duration of your ldapsearch queries. I had a look to all limit values and it was the only one that was equal to 900 (in seconds, e.g. 15 minutes).
I have the same problem with 6.5.1 and have a search head and a indexer. I have noticed that if we don't get any hit whitin 15 min the search just stops and search.log reports no data found, however if any data is retrieved within 15 min the search can continue for hours with no problem.
I have not been able to break it down as some search can contine without hits, anyway increasing the batch_wait_after_end at the search head has solved it.
Hi, I have the same behaviour: search is stopped after 900 seconds (with 6.5.1 on Windows 2008 R2 and SA-ldapsearch v2.1.0). I opened case #402460 on 05/10/2016... which I accept to be closed on 29/11 because Splunk support did not find the root cause. Even given that the problem is systematic in our standalone Splunk environment.
Let's hope Splunk will take a deeper look now 😉
Hi @brettwilliams - Were you able to test out ecathalo's solution below? If yes and it worked, please don't forget to resolve this post by clicking on "Accept". If you still need more help, please provide a comment with some feedback. Thanks!
Forgot to mention something relevant from search.log which happens at the same time as item 2. Looks like this:
01-06-2017 22:00:05.381 INFO script - Writing search results info to /opt/splunk/var/run/splunk/dispatch...
01-06-2017 22:15:05.400 WARN DispatchThread - _query_finished or lower level infrastructure is notifying that query is done but the et is not yet set to Zero.