Splunk ITSI

Why am I getting errors after upgrading Splunk IT Service Intelligence on my search head cluster?

mataharry
Communicator

I encountered problem with ITSI each time I tries to upgrade or install a new deployment.
- upgrading ITSI on version 2.6 on a search-head cluster, to 3.1
- installing a new 3.0.0 or 3.1.2 on a search-head cluster.

Each time I push the ITSI bits from the deployer and wait for the sh rolling restart.
Usually when a problem occurs, the symptoms are : ITSI panels not loading, permissions issues, and nothing in my configure > services and teams even for my admin user.
Looking in the logs, I see in

 index=_internal source=*itsi_migration.log* 

that one of the shpeer tried to start the install/migration but failed because of permissions of "teams" missing.
I checked, there are no teams in my ITSI (in the manager or in the kvstore collection)
I also see errors on some peers about sh captain not ready.

example :

2018-06-04 15:29:22,979 INFO [itsi.migration] [itsi_migration] [run_migration] [23748] Enable UI
Exception: Failed to import Team settings. ITSI will not work properly until the Team settings are imported. See [http://docs.splunk.com/Documentation/ITSI/3.0.1/Configure/Installationandconfigurationconsiderationsandissues#Run_script_to_set_the_default_team_to_Global this documentation page] for instructions on how to resolve this issue.
raise Exception(error_msg)
File "S:\splunk\etc\apps\SA-ITOA\lib\itsi\upgrade\itsi_migration.py", line 3269, in run_migration
Traceback (most recent call last):
2018-06-04 15:29:22,976 ERROR [itsi.migration] [itsi_migration] [run_migration] [23748] Migration failed from version:None, to version:3.1.2
1 Solution

yannK
Splunk Employee
Splunk Employee

The answer is in the question, you encountered a problem with the SHcluster restart order, and it caused the ITSI migration script to not run on the SHcaptain first as expected.

The logic should be :
- trigger rolling restart after pushing the apps from the deployer
- the original shcaptain is the last one to restart
- one it restarts, it triggers the ITSI scripts to create the "global team" object in the kvstore
- then trigger just after the first-install/migration script to setup the rest of ITSI collections in the kvstore

In your case, some SH may have timeout during the rolling restart, or the captain took too long to shutdown and restart.
As a consequence, another shpeer took over and became captain, but as it was already restarted, it did not run the global team creation script.
As you have no team available, the rest of the migration fails for permissions reasons, and the UI is only partially working.

The fix is always the same :
- use the manual script to create the missing global team.
http://docs.splunk.com/Documentation/ITSI/latest/Configure/Installationandconfigurationconsideration...

Run the following command on any search head in your ITSI deployment:
cd $SPLUNK_HOME/etc/apps/SA-ITOA/bin
$SPLUNK_HOME/bin/splunk cmd python itsi_reset_default_team.py
Provide the splunkd port number and Splunk user name and password when prompted.
After the script has successfully finished, the Global group is created in the kvstore.
Restart the Splunk platform.

in the case of Sh-cluster it means a rolling restart of the sh.

As side remark, another reason why the team creation may fail (not just on a SHcluster), is that the role admin is not inheriting from the role itao_admin
see this guide to fix it : http://docs.splunk.com/Documentation/ITSI/latest/Configure/UpgradeSplunkITServiceIntelligence#Check_...

View solution in original post

esnyder_splunk
Splunk Employee
Splunk Employee

FYI, this script content has moved here: https://docs.splunk.com/Documentation/ITSI/latest/Install/Upgrade#Why_is_the_global_team_gone_after_.... The URL changed when it was moved to the new Install/Upgrade guide.

0 Karma

yannK
Splunk Employee
Splunk Employee

The answer is in the question, you encountered a problem with the SHcluster restart order, and it caused the ITSI migration script to not run on the SHcaptain first as expected.

The logic should be :
- trigger rolling restart after pushing the apps from the deployer
- the original shcaptain is the last one to restart
- one it restarts, it triggers the ITSI scripts to create the "global team" object in the kvstore
- then trigger just after the first-install/migration script to setup the rest of ITSI collections in the kvstore

In your case, some SH may have timeout during the rolling restart, or the captain took too long to shutdown and restart.
As a consequence, another shpeer took over and became captain, but as it was already restarted, it did not run the global team creation script.
As you have no team available, the rest of the migration fails for permissions reasons, and the UI is only partially working.

The fix is always the same :
- use the manual script to create the missing global team.
http://docs.splunk.com/Documentation/ITSI/latest/Configure/Installationandconfigurationconsideration...

Run the following command on any search head in your ITSI deployment:
cd $SPLUNK_HOME/etc/apps/SA-ITOA/bin
$SPLUNK_HOME/bin/splunk cmd python itsi_reset_default_team.py
Provide the splunkd port number and Splunk user name and password when prompted.
After the script has successfully finished, the Global group is created in the kvstore.
Restart the Splunk platform.

in the case of Sh-cluster it means a rolling restart of the sh.

As side remark, another reason why the team creation may fail (not just on a SHcluster), is that the role admin is not inheriting from the role itao_admin
see this guide to fix it : http://docs.splunk.com/Documentation/ITSI/latest/Configure/UpgradeSplunkITServiceIntelligence#Check_...

Get Updates on the Splunk Community!

Take Your Breath Away with Splunk Risk-Based Alerting (RBA)

WATCH NOW!The Splunk Guide to Risk-Based Alerting is here to empower your SOC like never before. Join Haylee ...

SignalFlow: What? Why? How?

What is SignalFlow? Splunk Observability Cloud’s analytics engine, SignalFlow, opens up a world of in-depth ...

Federated Search for Amazon S3 | Key Use Cases to Streamline Compliance Workflows

Modern business operations are supported by data compliance. As regulations evolve, organizations must ...