Solved: Search process did not exit cleanly, exit_code=255...

vsai0718 · ‎01-24-2020

Hi,

In one of our indexer clusters which we query from a search head cluster, only one of the indexers is giving this error while running a search. The error I'm getting is:

<indexer_hostname>Search process did not exit cleanly, exit_code=255, description="exited with code 255". Please look in search.log for this peer in the Job Inspector for more info.

When going through search.log, for that particular indexer. All I can find is:

INFO  DistributedSearchResultCollectionManager - Connecting to peer=<indexer> connectAll 0 connectToSpecificPeer 1
INFO  DistributedSearchResultCollectionManager - Successfully created search result collector for peer=<indexer> in 0.002 seconds

And there aren't any ERROR in the search.log.

Although I found some errors in splunkd.log for the same indexer which is as below:

ERROR DistBundleRestHandler - Problem untarring file: /opt/splunk/var/run/searchpeers/xxx.bundle
WARN  DistBundleRestHandler - There was a problem renaming: /opt/splunk/var/run/searchpeers/xxx.tmp -> /opt/splunk/var/run/searchpeers/xxxx: Directory not empty

I have seen some of the previous answers, stating that there might be not enough free space available in that particular indexer, but when checked, there is still 40% more available space.

I couldn't figure out what was the problem as there was no ERROR in the search.log. I'm on Splunk 7.1.3

Thanks in advance.

vsai0718 · ‎02-26-2020

The error was solved, when I increased the ulimits for open files of indexers to the recommended 64000 initially it 4096.

View solution in original post

vsai0718 · ‎02-26-2020

The error was solved, when I increased the ulimits for open files of indexers to the recommended 64000 initially it 4096.

bkeif · ‎01-27-2020

We had this error before and it turned out to be IO bound. The search peer's IO was very low and so unable to handle the request properly.

vsai0718 · ‎01-27-2020

Then how did you solve it?

bkeif · ‎01-28-2020

Increase available IO on the host. That may be non trivial to do unless its virtual but that's what we did. Either increase speed of disks, number of disks or decrease other IO load.

We did this after trying several other exit_code=255 fixes (there seems to be many ways to get this error) and finding out that our IOPS on that particular box did not meet minimum spec.

vsai0718 · ‎02-12-2020

The IOPS are above the recommended specs but our ulimits for open files was less but even after increasing them. The error is still present. As of now for the problematic indexers I have quarantined. But I'm looking for a solution so that I can resolve the issue.

Do you know anyother solutions that might be able to resolve this.

Thanks

nickhills · ‎01-27-2020

Have you confirmed that the permissions are correct on your Search Peer? [edit, originally my response said SH]

Its easily done - accidentally running ./splunk stop && ./splunk start as root.
Unfortunately, the next time you restart the service as splunk the permissions are messed up.

Check the contents of /opt/splunk/var/run/searchpeers/ are all owned by "splunk" and not "root"

If my comment helps, please give it a thumbs up!

vsai0718 · ‎01-27-2020

We haven't restarted any of the search heads recently, and the permissions for the /opt/splunk/var/run/searchpeers/ is
on the search heads it is splunk
on the indexers it is root
which I think are correct as it should be

nickhills · ‎01-27-2020

Sorry - I meant the Search Peer (not Head) - will amend
Who does Splunk run as on the peers? (best practice suggests it should not be root - in which case, that could very well be your issue)

There are some reasons you might run as root - but that is seperate conversation 🙂

If my comment helps, please give it a thumbs up!

vsai0718 · ‎01-27-2020

I'm not sure what is best to be run as. But all other indexers and also this indexer was running as root from the beginning which is long time back. I was receiving this error as of now only on one of the indexer in the cluster.

maciep · ‎01-25-2020

were you looking at the search log on the offending indexer? the one in $SPLUNK_HOME/var/run/splunk/dispatch//search.log?

vsai0718 · ‎01-25-2020

No, I was looking on the search head.

The search.log on the problem indexer, I found in dispatch folder multiple search heads remote searches directories.

In the search.log,
This is the common log entries I found for ERROR

ERROR dispatchRunner - RunDispatch::runDispatchThread threw error: Application does not exist: <app_name_which_exists>

But all entries app_names were different which were created long back. None of the app's were created recently.

maciep · ‎01-25-2020

those logs will get purged after the search has expired. i'd suggest rerunning the search that is causing the problem, get the sid from the job inspector and then go find that specific search log on the indexer. And when you do find it, you may want to make a copy elsewhere since it will roll eventually.

vsai0718 · ‎01-25-2020

Currently, I've stopped splunk on that indexer as it is giving the error to the every search. Not particularly one search. Even if I search the index which is not in that indexer, search head is giving me that error.

maciep · ‎01-26-2020

hmm...ok. so you have errors about untarring the search bundle and errors about apps not existing. i wonder if you can look at the search bundle on that indexer and see if the app is in there? $SPLUNK_HOME/var/run/searchpeers I believe is the location.

In there are the search bundles from the search heads that contain all of the config the indexers need to run the searches. If they're not making it there, then maybe the index is throwing an error because it doesn't have the context to run the search?

vsai0718 · ‎01-27-2020

Yes, all the apps are present in the $SPLUNK_HOME/var/run/searchpeers location. I see the bundle from the deployer and in that all the apps, present on the search are here too.

vsai0718 · ‎01-27-2020

Well, I see that none of the search heads bundle is on the indexer and in the deployer's bundle the app for which the error is given that app does not exist. So, as you said may be indexer doesn't have the required bundle. What might be the issue or any remediation for this?

Search process did not exit cleanly, exit_code=255, description="exited with code 255". Please look in search.log for this peer in the Job Inspector for more info.

Dashboards: Hiding charts while search is being executed and other uses for tokens

Splunk Observability Cloud's AI Assistant in Action Series: Explaining Metrics and ...

Brains, Bytes, and Boston: Learn from the Best at .conf25

Are you a member of the Splunk Community?

Search process did not exit cleanly, exit_code=255, description="exited with code 255". Please look in search.log for this peer in the Job Inspector for more info.

Dashboards: Hiding charts while search is being executed and other uses for tokens

Splunk Observability Cloud's AI Assistant in Action Series: Explaining Metrics and ...

Brains, Bytes, and Boston: Learn from the Best at .conf25