Ok, I have asked this question a week ago and part of the issue was that Xvfb rpm was not installed on one of the search heads.
I am now running into a different issue. PDF scheduled searches will run and periodically work or error out. No configuration changes are made between a search working one day, then erroring out the next. The error is always "An error occurred while generating a PDF of this report:
Didn't receive PDF from Report Server."
My question is, does anyone with search head pooling enabled have the PDF server app successfully running? We are running 4.3.4 and enabled search head pooling a week ago. Everything else is working fine. I have a case open with Splunk support and they are researching the issue as well.
It would also be helpful if I explain our architecture. We are running two identical VM RHEL 5.9 x64 search heads. They are sitting behind a L7 loadbalancer. Following this document on Splunk Docs http://docs.splunk.com/Documentation/Splunk/4.3.4/Deploy/Configuresearchheadpooling (which I checked and it is identical to the most recent article for 5.0.2) it mentions changing the alert_actions.conf to point to the VIP for emails. This also affects the PDF server as it points to the VIP.
Back to my question, does anyone with a similar setup have the PDF server working consistently? I am beginning to think it might be the persistence set up on the L7 loadbalancer. If search head A calls the l7lb when the python script starts for the PDF server, and it says go to search head A, for instance, I am wondering if that work be a successful PDF email. However if search head A call the l7lb and it says go to search head B, I am wondering if that would result in a failed PDF email. I am assuming the l7lb would treat the search heads just as it would clients and choose to route them based on current load. I am going to ask the l7lb admin if he can make the search heads always choose themselves to see if that resolves the issue.
Any information would be appreciated.
Possible light at the end of the tunnel; I added a hosts entry on both search heeads for the hostname of the VIP to point to themselves respectively. This looks promising but only time will tell if it resolves the issue.
Adding server specific hosts entries on the search head pool members seems to have corrected this issue. Per the documentation for configuring search head pooling, we changed our alert_actions.conf to point to the hostname of the VIP. This seemed to also cause the PDF server app to make a TCP call to the VIP, just to be routed back to a search head. This was inefficient and sporadic, since if the VIP did not route it back to the originating search head, the PDF generation would fail.
Adding server specific hosts entries now allows us to keep the changes made to the alert_actions.conf. Initially I tried using the loopback IP, but since Splunk is not listening on that interface, I changed it to the server IP on eth0, which was successful. An example is as follows:
search head pool member A
search head pool member B
Hope that helps anyone else with a similar environment. Don't know if Splunk 5.x would resolve this issue another way, since PDF generation is built in.