We have a problem when users try to query a specific index.
They can query all of other indexes from this role, but not just this new one. The role is correctly configured as it's done by Professional Service resource, we verified authorize.conf, but only users with the roles "admin" or "users" can query this index.
- etc/system/local/authorize.conf has below;
[role_problem]
cumulativeRTSrchJobsQuota = 0
cumulativeSrchJobsQuota = 3
srchIndexesAllowed = index_good1;index_good2;index_problem
We try to create a new role, including access to "all non-internal indexes" (like admin and users roles) but it doesn't work either.
There is no problem on another platform with the version 9.0.4, however, this one of the version 8.1.3 with the same configs fails to search the index.
We analyzed the 'inspect job' as well but we didn't find any problem. There is no permission issues logged in splunkd.log or search.log, simply no data returns from the indexers.
The investigation steps to get to the bottom of the issue below;
1. Configuration - authentication.conf and authorize.conf. The same configs worked in another platform it surely is not a config issue. Make sure that the role config is set as that is in the authentication server.
As per the description in the question proves the configuration is correct in authentication.conf and authorize.conf - "There is no problem on another platform with the version 9.0.4, however, this one of the version 8.1.3 with the same configs fails to search the index."
2. Mapping of roles : In the search artifacts, args.txt in $SPLUNK_HOME/var/run/splunk/dispatch/<SID>
- Mapping looks correct as expected. "roleA and roleProblem".
--id=1689370508.2666921_AAAAAAAA-75CC-4A96-8BB5-AAAAAAAAAAAA
--maxbuckets=300
--ttl=300
--maxout=500000
--maxtime=8640000
--lookups=1
--reduce_freq=10
--rf=*
--user=badAccount
--pro
--roles=roleA:roleProblem
3. Search with DEBUG : To investigate the issue further, run the searches with DEBUG. One with admin and another with a bad user.
When running searches make sure to use the smallest time ranges that return just a handful events and use a 'splunk_server' to collect a diag from an indexer too.
<The_search_being_used> splunk_server=IndexerA | noop log_appender=“searchprocessAppender;maxFileSize=100000000;maxBackupIndex=10" log_debug=* set_ttl=1h
i) index=goodIndex OR index=badIndex splunk_server=IndexerA
ii) index=goodIndex splunk_server=IndexerA
iii) index=badIndex splunk_server=IndexerA
* This could be a corner case that may not happen that often. Here's what happened from indexer diag and SID/search.log :
i) The search logs have meaningful differences in terms of the "required indexes" as below;
i - (index=goodIndex OR index=badIndex) splunk_server=indexerA | noop log_debug=* set_ttl=1h
Admin - SID: 1689370032.2666760_AAAAAAA-75CC-4A96-8BB5-AAAAAAAAAAAAA
07-14-2023 18:27:15.700 INFO DispatchCommandProcessor - Search requires the following indexes="[goodIndex,badIndex]"
User "badAccount" - SID: 1689370104.2666778_AAAAAAA-75CC-4A96-8BB5-AAAAAAAAAAAAA
07-14-2023 18:28:26.631 INFO DispatchCommandProcessor - Search requires the following indexes="[goodIndex]" <<<<--- badIndex missing for badAccount
ii - index=goodIndex splunk_server=indexerA | noop log_debug=* set_ttl=1h
Admin - SID: 1689370231.2666860_AAAAAAA-75CC-4A96-8BB5-AAAAAAAAAAAAA
07-14-2023 18:30:35.769 INFO DispatchCommandProcessor - Search requires the following indexes="[goodIndex]"
User "badAccount" - SID: 1689370287.2666877_AAAAAAA-75CC-4A96-8BB5-AAAAAAAAAAAAA
07-14-2023 18:31:29.105 INFO DispatchCommandProcessor - Search requires the following indexes="[goodIndex]"
iii - index=badIndex splunk_server=indexerA | noop log_debug=* set_ttl=1h
(Good Search) Admin - SID: 1689370349.2666882_AAAAAAA-75CC-4A96-8BB5-AAAAAAAAAAAAA
07-14-2023 18:32:31.004 INFO DispatchCommandProcessor - Search requires the following indexes="[badIndex]"
(Bad Search) User "badAccount" - SID: 1689370411.2666889_AAAAAAA-75CC-4A96-8BB5-AAAAAAAAAAAAA
07-14-2023 18:33:32.492 INFO DispatchCommandProcessor - Search requires the following indexes="[]" <<<---- badIndex missing
============
ii) Search bundle information in search.log:
Further looking into the logs, the bundle for the searches appears to be a very old one, from May 12, 2023 3:36:39 PM GMT-07:00 PST ( 2 months old already Today is July 14) .
User "badAccount" has a log message as below; '1683930999' at the end of the log is the epoch time of the search bundle.
07-14-2023 18:33:32.247 INFO dispatchRunner - search context: user="badAccount", app="search", bs-pathname="/opt/splunk/var/run/searchpeers/AAAAAAA-75CC-4A96-8BB5-AAAAAAAAAAAAA-1683930999"
iii) Search bundle information in indexerA diag:
This turned out to be the same in the bundle information of the IndexerA diag.
-----< Excerpts from indexer diag >----------
********** search peer bundles dir listing **********
ls -alR "/opt/splunk/var/run/searchpeers"
/opt/splunk/var/run/searchpeers:
total 5927728
drwx------ 96 splunk splunk 16384 Jul 14 18:45 .
drwx--x--- 5 splunk splunk 52 Dec 23 2020 ..
drwx------ 7 splunk splunk 163 May 12 19:35 AAAAAAA-75CC-4A96-8BB5-AAAAAAAAAAAAA-1683930902
-rw------- 1 splunk splunk 10240 May 12 19:37 AAAAAAA-75CC-4A96-8BB5-AAAAAAAAAAAAA-1683930999.delta
drwx------ 7 splunk splunk 163 May 12 19:37 AAAAAAA-75CC-4A96-8BB5-AAAAAAAAAAAAA-1683930999
iv) Looking into the bundles in the SHC Captain :
There are a couple of lookup csv files over 1.7GB and removing them fixed the issue. Here's what we can do to address the bundle size issue in general.
i) To reduce the size of bundles :
- use replicationBlacklist
https://docs.splunk.com/Documentation/Splunk/8.1.3/Admin/Distsearchconf
- Find large files, such as lookup files :
$ tar -tvf Bune_Name_file | grep "lookup" | sort -nk3
ii)Or to increase maxBundleSize if there's no more file to exclude
@all SH,
[replicationSettings] in distsearch.conf
maxBundleSize=2048 >> change
[httpServer] in server.conf
max_content_length = 2147483648 (2gb in bytes) >> Change
@All indexers for the cluster
[httpServer] in server.conf
max_content_length = 2147483648 (2gb in bytes) >> Change
The investigation steps to get to the bottom of the issue below;
1. Configuration - authentication.conf and authorize.conf. The same configs worked in another platform it surely is not a config issue. Make sure that the role config is set as that is in the authentication server.
As per the description in the question proves the configuration is correct in authentication.conf and authorize.conf - "There is no problem on another platform with the version 9.0.4, however, this one of the version 8.1.3 with the same configs fails to search the index."
2. Mapping of roles : In the search artifacts, args.txt in $SPLUNK_HOME/var/run/splunk/dispatch/<SID>
- Mapping looks correct as expected. "roleA and roleProblem".
--id=1689370508.2666921_AAAAAAAA-75CC-4A96-8BB5-AAAAAAAAAAAA
--maxbuckets=300
--ttl=300
--maxout=500000
--maxtime=8640000
--lookups=1
--reduce_freq=10
--rf=*
--user=badAccount
--pro
--roles=roleA:roleProblem
3. Search with DEBUG : To investigate the issue further, run the searches with DEBUG. One with admin and another with a bad user.
When running searches make sure to use the smallest time ranges that return just a handful events and use a 'splunk_server' to collect a diag from an indexer too.
<The_search_being_used> splunk_server=IndexerA | noop log_appender=“searchprocessAppender;maxFileSize=100000000;maxBackupIndex=10" log_debug=* set_ttl=1h
i) index=goodIndex OR index=badIndex splunk_server=IndexerA
ii) index=goodIndex splunk_server=IndexerA
iii) index=badIndex splunk_server=IndexerA
* This could be a corner case that may not happen that often. Here's what happened from indexer diag and SID/search.log :
i) The search logs have meaningful differences in terms of the "required indexes" as below;
i - (index=goodIndex OR index=badIndex) splunk_server=indexerA | noop log_debug=* set_ttl=1h
Admin - SID: 1689370032.2666760_AAAAAAA-75CC-4A96-8BB5-AAAAAAAAAAAAA
07-14-2023 18:27:15.700 INFO DispatchCommandProcessor - Search requires the following indexes="[goodIndex,badIndex]"
User "badAccount" - SID: 1689370104.2666778_AAAAAAA-75CC-4A96-8BB5-AAAAAAAAAAAAA
07-14-2023 18:28:26.631 INFO DispatchCommandProcessor - Search requires the following indexes="[goodIndex]" <<<<--- badIndex missing for badAccount
ii - index=goodIndex splunk_server=indexerA | noop log_debug=* set_ttl=1h
Admin - SID: 1689370231.2666860_AAAAAAA-75CC-4A96-8BB5-AAAAAAAAAAAAA
07-14-2023 18:30:35.769 INFO DispatchCommandProcessor - Search requires the following indexes="[goodIndex]"
User "badAccount" - SID: 1689370287.2666877_AAAAAAA-75CC-4A96-8BB5-AAAAAAAAAAAAA
07-14-2023 18:31:29.105 INFO DispatchCommandProcessor - Search requires the following indexes="[goodIndex]"
iii - index=badIndex splunk_server=indexerA | noop log_debug=* set_ttl=1h
(Good Search) Admin - SID: 1689370349.2666882_AAAAAAA-75CC-4A96-8BB5-AAAAAAAAAAAAA
07-14-2023 18:32:31.004 INFO DispatchCommandProcessor - Search requires the following indexes="[badIndex]"
(Bad Search) User "badAccount" - SID: 1689370411.2666889_AAAAAAA-75CC-4A96-8BB5-AAAAAAAAAAAAA
07-14-2023 18:33:32.492 INFO DispatchCommandProcessor - Search requires the following indexes="[]" <<<---- badIndex missing
============
ii) Search bundle information in search.log:
Further looking into the logs, the bundle for the searches appears to be a very old one, from May 12, 2023 3:36:39 PM GMT-07:00 PST ( 2 months old already Today is July 14) .
User "badAccount" has a log message as below; '1683930999' at the end of the log is the epoch time of the search bundle.
07-14-2023 18:33:32.247 INFO dispatchRunner - search context: user="badAccount", app="search", bs-pathname="/opt/splunk/var/run/searchpeers/AAAAAAA-75CC-4A96-8BB5-AAAAAAAAAAAAA-1683930999"
iii) Search bundle information in indexerA diag:
This turned out to be the same in the bundle information of the IndexerA diag.
-----< Excerpts from indexer diag >----------
********** search peer bundles dir listing **********
ls -alR "/opt/splunk/var/run/searchpeers"
/opt/splunk/var/run/searchpeers:
total 5927728
drwx------ 96 splunk splunk 16384 Jul 14 18:45 .
drwx--x--- 5 splunk splunk 52 Dec 23 2020 ..
drwx------ 7 splunk splunk 163 May 12 19:35 AAAAAAA-75CC-4A96-8BB5-AAAAAAAAAAAAA-1683930902
-rw------- 1 splunk splunk 10240 May 12 19:37 AAAAAAA-75CC-4A96-8BB5-AAAAAAAAAAAAA-1683930999.delta
drwx------ 7 splunk splunk 163 May 12 19:37 AAAAAAA-75CC-4A96-8BB5-AAAAAAAAAAAAA-1683930999
iv) Looking into the bundles in the SHC Captain :
There are a couple of lookup csv files over 1.7GB and removing them fixed the issue. Here's what we can do to address the bundle size issue in general.
i) To reduce the size of bundles :
- use replicationBlacklist
https://docs.splunk.com/Documentation/Splunk/8.1.3/Admin/Distsearchconf
- Find large files, such as lookup files :
$ tar -tvf Bune_Name_file | grep "lookup" | sort -nk3
ii)Or to increase maxBundleSize if there's no more file to exclude
@all SH,
[replicationSettings] in distsearch.conf
maxBundleSize=2048 >> change
[httpServer] in server.conf
max_content_length = 2147483648 (2gb in bytes) >> Change
@All indexers for the cluster
[httpServer] in server.conf
max_content_length = 2147483648 (2gb in bytes) >> Change