About sylim_splunk

sylim_splunk

It appears hitting a known issue for some recent versions below. 9.4.1 9.4.0 9.3.3 9.2.5 9.1.8 You may want to check this article. https://github.com/splunk/docker-splunk/issues/698

sylim_splunk · ‎06-25-2024

This is more of annoying log message issue. The log messages are intended to be suppressed and can be ignored unless it affects any Splunk performances in indexing or searching. Fix versions, 9.1.3+, 9.2.0+

sylim_splunk · ‎04-29-2024

* Known issue: This has been reported and worked on by our Dev team. So far it is found to be a display issue that doesn't affect the functionality of the filter - We also acknowledge that it makes DS/DC harder to use for Splunk admins. * Workaround: There's no workaround for the bug - this issue only affects filtering on machine types, so if it’s possible to arrange some filters which won’t require machine types (like names, addresses, etc), these filters work. Obviously after fix they can return to more convenient machine types filters. * Fixed version: The fix has been scheduled and will be available for the maintenance version 9.2.2. The GA is currently for mid June, which is subject to change according to the build schedule/load. If you still find it not work with 9.2.2 please check it with Splunk Support.

sylim_splunk · ‎08-01-2023

SplunkWeb users may experience different behaviors for the UI preferences that used to persist and show latest preferences by updating ui-prefs.conf on the fly. Now after upgrade to 9.0.5+ or 9.1.0+ its behavior changed and no longer uses ui-prefs.conf to remember the user's UI level preferences, but instead, uses the url in the request or localStorage/Web Storage. However the ui-prefs.conf is still used to set the initial values up - simply became read-only mode. Due to this change you will find the user level changes on the search page may not persist. Here are some examples you can experience by this changes: - The search mode, fast/smart/verbose mode you set just now is discarded. - The chart type, line, bar or pie, you chose just now is not persisting and changed to one type whenever logging in back. - Some features, such as the time ranges you used to use in time range picker is reverted back after the browser cache clean-up. - Or you have to set your own search page preferences to each browser or different PCs you use. Reason for the change: This change was decided to improve the slow UI loading time issue for a large deployment with thousands of users. Instead of using central location, ui-prefs.conf on the Search Head the load has been spread to the user level. Affected versions : 9.0.5+, 9.1.0+ What's the expected with the new behavior: With this change each parameter in the ui-prefs.conf has been implemented basically to remove POST method, either added into URL in GET request or stored in the localStorage of the user browser. If it’s stored in the URL as a parameter it lasts the length of the current workflow. ie. . Once you load the page again without the parameter in the URL, it is discarded. If it's in localStorage it will last until it gets deleted from the Web Storage. You can find more detailed conditions for different browsers here; How the data in localStorage get deleted. How can I go back to the previous behavior although it's not recommended: If the customer would like to re-enable the old workflow temporarily (this will result in poorer performance for web page loads - the reason why this change was made), optimize_ui_prefs_performance in web-features.conf can be set to false as a temporary workaround. In web-features.conf: [feature:ui_prefs_optimizations] optimize_ui_prefs_performance = false (true by default)

sylim_splunk · ‎08-01-2023

In GUI > Search app > search page I used to change the search mode to verbose which I knew it persists after the sessions. Such as, when I change the mode to "Fast" and log out. Once I log in back it shows "Fast". But not any more after we upgraded to 9.0.5 from 9.0.4. Why? [ui-prefs.conf, url or localStorage]

sylim_splunk · ‎07-20-2023

The browser is supposed to check in and send keep alive/poll GETs to splunk server every 1 second while a search is running. However it's stopped at some point and comes back later, then the splunk server sends back the error 404 as it's already timed out. Here's the logs: 1. auto-cancel logged in to search_messages.log ; 07-20-2023 13:37:45.264 -0400 ERROR SearchMessages - orig_component="" app="search" sid="1689874558.349256_AAAAAAAA-BBBB-CCCC-DDDD-EEEEEEEEEE" message_key="" message=Search auto-canceled 07-20-2023 13:46:45.343 -0400 ERROR SearchMessages - orig_component="" app="search" sid="1689874865.349389_AAAAAAAA-BBBB-CCCC-DDDD-EEEEEEEEEE" message_key="" message=Search auto-canceled 07-20-2023 14:00:45.323 -0400 ERROR SearchMessages - orig_component="" app="search" sid="1689875862.349793_AAAAAAAA-BBBB-CCCC-DDDD-EEEEEEEEEE" message_key="" message=Search auto-canceled 2. splunkd_ui_access log for sid="1689875862.349793_AAAAAAAA-BBBB-CCCC-DDDD-EEEEEEEEEE" shows that the keep alive stopped and resumed after 1min in the graph. There's a 1min gap between the last success of 200 and the error 404 in the graph below. The browser took longer than 30 seconds (job_default_auto_cancel=30 for 8.2.9) at times to send keep alive pool to splunk server. index=_internal source=*splunkd_ui_access.log "1689875862.349793_AAAAAAAA-BBBB-CCCC-DDDD-EEEEEEEEEE" |timechart span=1s count by status Below has more details of the access log. Every second the browser sent KeepAlives then got http 200. But the last one received http 404 due to the timeout. 3. Recommendation: i) In web.conf for the versions prior to 9.0 only. [settings] job_default_auto_cancel = 62 - 30 in ver 8.2.9 and increased to 62 in ver 9.0+. ii) In limits.conf [search] min_settings_period = 60

sylim_splunk · ‎07-20-2023

We have a requirement to pull security logs for past specific the time ranges - i.e from December 2022 - Apr 2023, Splunk cannot complete a search without expiring for even a 1 hour window in December. This fails our published 12 month retention period for these logs. Please provide options for how to Identify, correct, or Improve this Search challenge. The search job 'SID' was canceled remotely or expired. Sometimes the GUI shows "Unknown SID". The version currently used is 8.2.9.

sylim_splunk · ‎07-19-2023

The investigation steps to get to the bottom of the issue below; 1. Configuration - authentication.conf and authorize.conf. The same configs worked in another platform it surely is not a config issue. Make sure that the role config is set as that is in the authentication server. As per the description in the question proves the configuration is correct in authentication.conf and authorize.conf - "There is no problem on another platform with the version 9.0.4, however, this one of the version 8.1.3 with the same configs fails to search the index." 2. Mapping of roles : In the search artifacts, args.txt in $SPLUNK_HOME/var/run/splunk/dispatch/<SID> - Mapping looks correct as expected. "roleA and roleProblem". --id=1689370508.2666921_AAAAAAAA-75CC-4A96-8BB5-AAAAAAAAAAAA --maxbuckets=300 --ttl=300 --maxout=500000 --maxtime=8640000 --lookups=1 --reduce_freq=10 --rf=* --user=badAccount --pro --roles=roleA:roleProblem 3. Search with DEBUG : To investigate the issue further, run the searches with DEBUG. One with admin and another with a bad user. When running searches make sure to use the smallest time ranges that return just a handful events and use a 'splunk_server' to collect a diag from an indexer too. <The_search_being_used> splunk_server=IndexerA | noop log_appender=“searchprocessAppender;maxFileSize=100000000;maxBackupIndex=10" log_debug=* set_ttl=1h i) index=goodIndex OR index=badIndex splunk_server=IndexerA ii) index=goodIndex splunk_server=IndexerA iii) index=badIndex splunk_server=IndexerA * This could be a corner case that may not happen that often. Here's what happened from indexer diag and SID/search.log : i) The search logs have meaningful differences in terms of the "required indexes" as below; i - (index=goodIndex OR index=badIndex) splunk_server=indexerA | noop log_debug=* set_ttl=1h Admin - SID: 1689370032.2666760_AAAAAAA-75CC-4A96-8BB5-AAAAAAAAAAAAA 07-14-2023 18:27:15.700 INFO DispatchCommandProcessor - Search requires the following indexes="[goodIndex,badIndex]" User "badAccount" - SID: 1689370104.2666778_AAAAAAA-75CC-4A96-8BB5-AAAAAAAAAAAAA 07-14-2023 18:28:26.631 INFO DispatchCommandProcessor - Search requires the following indexes="[goodIndex]" <<<<--- badIndex missing for badAccount ii - index=goodIndex splunk_server=indexerA | noop log_debug=* set_ttl=1h Admin - SID: 1689370231.2666860_AAAAAAA-75CC-4A96-8BB5-AAAAAAAAAAAAA 07-14-2023 18:30:35.769 INFO DispatchCommandProcessor - Search requires the following indexes="[goodIndex]" User "badAccount" - SID: 1689370287.2666877_AAAAAAA-75CC-4A96-8BB5-AAAAAAAAAAAAA 07-14-2023 18:31:29.105 INFO DispatchCommandProcessor - Search requires the following indexes="[goodIndex]" iii - index=badIndex splunk_server=indexerA | noop log_debug=* set_ttl=1h (Good Search) Admin - SID: 1689370349.2666882_AAAAAAA-75CC-4A96-8BB5-AAAAAAAAAAAAA 07-14-2023 18:32:31.004 INFO DispatchCommandProcessor - Search requires the following indexes="[badIndex]" (Bad Search) User "badAccount" - SID: 1689370411.2666889_AAAAAAA-75CC-4A96-8BB5-AAAAAAAAAAAAA 07-14-2023 18:33:32.492 INFO DispatchCommandProcessor - Search requires the following indexes="[]" <<<---- badIndex missing ============ ii) Search bundle information in search.log: Further looking into the logs, the bundle for the searches appears to be a very old one, from May 12, 2023 3:36:39 PM GMT-07:00 PST ( 2 months old already Today is July 14) . User "badAccount" has a log message as below; '1683930999' at the end of the log is the epoch time of the search bundle. 07-14-2023 18:33:32.247 INFO dispatchRunner - search context: user="badAccount", app="search", bs-pathname="/opt/splunk/var/run/searchpeers/AAAAAAA-75CC-4A96-8BB5-AAAAAAAAAAAAA-1683930999" iii) Search bundle information in indexerA diag: This turned out to be the same in the bundle information of the IndexerA diag. -----< Excerpts from indexer diag >---------- ********** search peer bundles dir listing ********** ls -alR "/opt/splunk/var/run/searchpeers" /opt/splunk/var/run/searchpeers: total 5927728 drwx------ 96 splunk splunk 16384 Jul 14 18:45 . drwx--x--- 5 splunk splunk 52 Dec 23 2020 .. drwx------ 7 splunk splunk 163 May 12 19:35 AAAAAAA-75CC-4A96-8BB5-AAAAAAAAAAAAA-1683930902 -rw------- 1 splunk splunk 10240 May 12 19:37 AAAAAAA-75CC-4A96-8BB5-AAAAAAAAAAAAA-1683930999.delta drwx------ 7 splunk splunk 163 May 12 19:37 AAAAAAA-75CC-4A96-8BB5-AAAAAAAAAAAAA-1683930999 iv) Looking into the bundles in the SHC Captain : There are a couple of lookup csv files over 1.7GB and removing them fixed the issue. Here's what we can do to address the bundle size issue in general. i) To reduce the size of bundles : - use replicationBlacklist https://docs.splunk.com/Documentation/Splunk/8.1.3/Admin/Distsearchconf - Find large files, such as lookup files : $ tar -tvf Bune_Name_file | grep "lookup" | sort -nk3 ii)Or to increase maxBundleSize if there's no more file to exclude @all SH, [replicationSettings] in distsearch.conf maxBundleSize=2048 >> change [httpServer] in server.conf max_content_length = 2147483648 (2gb in bytes) >> Change @All indexers for the cluster [httpServer] in server.conf max_content_length = 2147483648 (2gb in bytes) >> Change

sylim_splunk · ‎07-19-2023

We have a problem when users try to query a specific index. They can query all of other indexes from this role, but not just this new one. The role is correctly configured as it's done by Professional Service resource, we verified authorize.conf, but only users with the roles "admin" or "users" can query this index. - etc/system/local/authorize.conf has below; [role_problem] cumulativeRTSrchJobsQuota = 0 cumulativeSrchJobsQuota = 3 srchIndexesAllowed = index_good1;index_good2;index_problem We try to create a new role, including access to "all non-internal indexes" (like admin and users roles) but it doesn't work either. There is no problem on another platform with the version 9.0.4, however, this one of the version 8.1.3 with the same configs fails to search the index. We analyzed the 'inspect job' as well but we didn't find any problem. There is no permission issues logged in splunkd.log or search.log, simply no data returns from the indexers.

sylim_splunk · ‎07-06-2023

i) What's the WARN message? This is a new one added to support containerized host to the versions above 9.0.3. If it's confirmed to be hosted by a container it will use different library calls to detect system configurations/capacities.. Basically we query /sys/fs/cgroup/memory/ and /sys/fs/cgroup/cpu,cpuacct and we read /proc/1/cgroup to see if we are running inside of a container. Firstly, validates that the splunkd is running inside the container and once it validates then its reads the cpu/ram from the cgroup rather than reading from the traditional syscalls. - ii) Why it is happening after upgrade: - With the version 8.2.4 it's not there yet. It just starts to happen after the 9.0.4.1 upgrade as it's just added to the versions above 9.0.3. iii) What the log message tells us: - The user, splunk is not allowed to access "/proc/1/cgroup", which has even set to proper permissions #ls -ld /proc/1 dr-xr-xr-x 9 root root 0 Apr 12 00:45 /proc/1 # ls -l /proc/1/cgroup -r--r--r-- 1 root root 0 Jun 15 17:37 /proc/1/cgroup - Other group users should be able to access and execute commands in "/proc/1" with "r-w" but the splunk account fails to access it and puts the WARN message: iv) The reason for the access failure: The proc file system was created with gid and hidepid=2 as below - only accessible by the account in the group 14001, that splunk account doesn't belong to; ----- mount options for proc -- $ findmnt /proc proc on /proc type proc (rw,relatime,gid=14001,hidepid=2) ------ - hidepid=0 :By default, the hidepid option has the value zero (0). This means that every user can see all data. - hidepid=1: When setting it to 1, the directories entries in /proc will remain visible, but not accessible. - hidepid=2: With value 2 they are hidden altogether. Reference: https://linux-audit.com/linux-system-hardening-adding-hidepid-to-proc/ v) Resolutions: i) add splunk uid to the gid 14001, $usermod -aG 14001 splunk Or ii) remove the gid and hidepid. $ mount -o remount,gid=0,hidepid=0 /proc FYI, the redhat doc recommends hidepid should not be used with systemd in rhel 7+ due to some reason mentioned in the link, https://access.redhat.com/solutions/6704531 .

sylim_splunk · ‎07-06-2023

After upgrade from 8.2.4 to 9.0.4.1 forwarders connect to indexers then after the Indexer cluster gets stabilized. All looked good - new data are delivered and indexed and searching works fine. However, we start seeing the log messages below, WARN level messages, being populated into splunkd.log.: 05-31-2023 06:47:21.407 -0500 WARN SystemInfo [15415 TcpChannelThread] - Invalid file path /proc/1/cgroup while checking container status During the upgrade, no new apps were added and no container is used for splunk. This kind of messages are found 3-4 times/min by different components and also in pretty much all splunk entities, including SH, deployer, LM, indexers and CM. We would like an analysis for that one.

sylim_splunk · ‎05-26-2023

When you send data over HEC, especially for international character data, make sure to specify the charset encoding for the data set. That way the receiving end knows its charset encoding and how to decrypt. Oftentimes the encryption method is different than what you expect. Here's a good example for your reference: https://medium.com/@rysartem/sending-data-to-splunk-hec-in-a-right-way-4a84af3c44e2

sylim_splunk · ‎05-26-2023

I have some data that is being collected from an AWS lambda and delivered to Splunk via HEC with the listeners on the indexers. This data contains Japanese characters but is not displaying properly in SplunkWeb. I have applied a host level stanza on both the search head and indexers to CHARSET = SHIFT-JIS, however, the data is still displayed as question marks in SplunkWeb. I have tried AUTO, UTF-8 and SHIFT-JIS without success.

sylim_splunk · ‎05-26-2023

Here are my findings. i) This has been there for a while even before the 8.2.10 upgrade - I checked some logs from old diags, it goes as far back with 8.1.x. ii) In SHC environment, we recommend you to use source IP stickiness for any Load Balancers so that you log in to SH1 and launch a search on SH1, then gets updates through SH1. However, it appears you requested search preview updates from another Search Head, for example, SH2 which would trigger and request SH1 for your search updates but, as it was then still running, SH1 would have put errors as below; ----- From SH1, IP Addr: 10.9.160.139 , error for the proxy request for a search that is still running -- 05-24-2023 13:33:49.894 -0700 ERROR SHCSlaveArtifactHandler [222579 TcpChannelThread] - Failed to trigger replication (artifact='1684960384.101_13E3A0F-27AE-49C5-9FB2-23862EDB224B') (err='event=SHPSlave::replicateArtifactTo invalid status=alive to be a source for replication') ----- iii) Logs proves the scenario : 1. The search is an adhoc with SID, "1684960384.101_13E3A0F-27AE-49C5-9FB2-23862EDB224B" by "admin" and found in the search artifacts/dispatch dir; $ ls -l $SPLUNK_HOME/var/run/splunk/dispatch/1684960384.101_13E3A0F-27AE-49C5-9FB2-23862EDB224B/ -rwxr-xr-x 1 support support 242 May 24 20:33 args.txt ... <SNIP> -rwxr-xr-x 1 support support 828809 May 24 20:34 search.log 2. From splunkd_ui_access.log : User, "admin" started the search with SID: 1684960384.101_13E3A0F-27AE-49C5-9FB2-23862EDB224B, -- splunkd_ui_access.log -- 10.6.248.0 - admin [24/May/2023:13:33:05.139 -0700] "GET /en-US/splunkd/__raw/servicesNS/admin/search/search/jobs/1684960384.101_13E3A0F-27AE-49C5-9FB2-23862EDB224B?output_mode=json&_=1684959918754 HTTP/1.1" 200 1147 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/113.0.0.0 Safari/537.36" - 0dd2b80d72da3ed97e640b9101f0a698 5ms 3. Search string from SID:684960384.101_13E3A0F-27AE-49C5-9FB2-23862EDB224B -- search.log -- 05-24-2023 13:33:05.302 INFO SearchParser [223948 RunDispatch] - PARSING: search index=apps sourcetype=api <SNIP>...ion as ApplicationName | search name!=default pname!=zz-default | stats count by AppName pname | table AppName pname 4. 2s after the search started, a replicate request came in from 10.9.129.34 which is SH2 but immediately returned an error, 500 as it's not finished. This proxy request for the search results keeps coming in every second but always gets the error, 500 as it was not completed. -- splunkd_access.log -- 10.9.129.34 - splunk-system-user [24/May/2023:13:33:07.016 -0700] "POST /services/shcluster/member/artifacts/1684960384.101_13E3A0F-27AE-49C5-9FB2-23862EDB224B/replicate?output_mode=json HTTP/1.1" 500 231 "-" "Splunk/8.2.10 (Linux 3.10.0-1160.49.1.el7.x86_64; arch=x86_64)" - 2ms Search results preview request should hit the same SH as the search was launched but it appears to routed to different Search head. Recommendation: need to check with your LB or DNS team, such as whether LB has source IP Stickiness set or if you use DNS roundrobin between the multiple LBs in front of SHC, that uses same FQDN or URL as this will lead the users to different LBs or Search heads..

sylim_splunk · ‎05-26-2023

In SHC with the version 8.2.10, from time to time we found this type of ERROR messages from SHCRepJob as below; - splunkd.log from a SHC member 05-24-2023 17:39:31.941 +0000 ERROR SHCRepJob [54418 SHPPushExecutorWorker-0] - failed job=SHPRepJob peer="<PEER1 FQDN>", guid="PEER1C47-1E44-48A0-A0F2-35DE6E449C65" aid=1684949135.77748_B2392C47-1E44-48A0-A0F2-35DE6E449C65, tgtPeer="<PEER2 FQDN>", tgtGuid="PEER2D44-E56B-4ABA-822A-4C40ACF1E484", tgtRP=<ReplicationPort>, useSSL=false tgt_hp=10.9.129.18:8089 tgt_guid=PEER2D44-E56B-4ABA-822A-4C40ACF1E484 err=uri=https://PEER1:8089/services/shcluster/member/artifacts/1684949135.77748_PEER1C47-1E44-48A0-A0F2-35DE6E449C65/replicate?output_mode=json, error=500 - Failed to trigger replication (artifact='1684949135.77748_PEER1C47-1E44-48A0-A0F2-35DE6E449C65') (err='event=SHPSlave::replicateArtifactTo invalid status=alive to be a source for replication') We used to have bundle replication issues but searches appear to be running and completing as expected. Is this something to worry or why does this happen?

sylim_splunk · ‎12-09-2022

'splunk-MonitorNoHandle.exe' is designed to hold data when it's not able to send to UF, use unlimited memory and this symptom can happen when there are huge amount of data to forward to the indexers while the forwarding speed by UF is limited. i) Check the queue status of parsing queue and tcpoutput queue to find which one is getting blocked first. Parsing queue blocked firstly which means it receives over the capacity - This can happen when maxKBps is throttled to the default, 256 ,change this to 0, unlimited or something your environment allows. - side effect it can bombard indexers if it's sending unlimited, huge amount of data. ii) In splunkd log, find any messages showing difficulties in sending data to the next receiving ends: Even if the parsing pipeline can send more data by increasing maxKBps, if tcpoutput gets blocked then you will see the same issues again. Below are the example logs that UF has problem in connecting to the indexers ; 11-12-2021 11:00:11.365 -0500 WARN TcpOutputProc - Cooked connection to ip=172.22.1.218:9997 timed out 11-12-2021 11:01:48.391 -0500 WARN TcpOutputProc - Cooked connection to ip=172.22.1.218:9997 timed out 11-12-2021 11:12:28.757 -0500 WARN TcpOutputProc - The TCP output processor has paused the data flow. Forwarding to output group ABC_indexers has been blocked for 500 seconds. This will probably stall the data flow towards indexing and other network outputs. Review the receiving system's health in the Splunk Monitoring Console. It is probably not accepting data. iii) Recommendations : iii-1) Parsing Queue being always full more often than the tcpoutput queue does - meaning MonitorNoHandle is sending data over the capacity that Parsing process can handle. This can happen when maxKBps is throttled to the default, 256 , then consider to increase the value according to your traffic size. http://docs.splunk.com/Documentation/Splunk/latest/Admin/Limitsconf - side effect it can bombard indexers if it's sending unlimited, huge amount of data. iii-2) How to set the memory limit used by the modInput, MonitorNoHandle; [inputproc] in limits.conf monitornohandle_max_heap_mb=5000 monitornohandle_max_driver_mem_mb=5000 ( 5000/5gb can be changed according to your environment) iii-3) If you find intermittent blockages on tcpoutput queue This can also contribute to the MonitorNoHandle's memory growth as the Parsing Pipeline can not send as much data as it receives - then MonitorNoHandle.exe has to hold the backlog data within own heap memory that can grow unexpectedly. Consider to implement asynchronous forwarding so that it can spread the data without pausing the data flow and that way MonitorNoHandle.exe may have less chances to hit the heap limit. You can consult with our PS resources for implementation too.

sylim_splunk · ‎12-09-2022

We have been experiencing unusually high memory usage on some of our domain controllers. The culprit here is Splunk process splunk-MonitorNoHandle.exe. Here is the report of the memory usage of the domain controllers: DC1 splunk-MonitorNoHandle.exe 17724 Services 0 14,993,012 K DC2 splunk-MonitorNoHandle.exe 53268 Services 0 38,927,688 K DC3 splunk-MonitorNoHandle.exe 16164 Services 0 43,997,828 K

sylim_splunk · ‎11-29-2022

SPL-208206

sylim_splunk · ‎11-01-2022

This has just been fixed and available for download, in 8.1.7+, 8.2.3+, 8.3.x and afterwards. It is found in the release notes in Fixed Issues section, for example version 8.2.3 - https://docs.splunk.com/Documentation/Splunk/8.2.3/ReleaseNotes/Fixedissues Please check the release notes for the versions you want to upgrade to. ( SPL-210455)

sylim_splunk · ‎10-26-2022

@_pravin the command is only effective on the SH you run. And the index buckets in the indexers are not affected by the commend.

sylim_splunk · ‎10-25-2022

You can use "splunk clean eventdata -index " command like below; $ splunk stop $ splunk clean eventdata -index "_internal"

sylim_splunk · ‎09-21-2022

# How to get cookies for simulation or accessing UI port. # cval=`curl -c - -k http://splunk:8000 -L -o a 2>/dev/null|grep cval|tr -s " " " " | cut -d $'\t' -f 7` ab=`curl -c - -k http://splunk:8000/en-US/account/login -H "Cookie: cval=$cval" -d "username=MYUSER&password=MYPASSWORD&cval=$cval" -o a 2>/dev/null |egrep "csrf|splunkd_8000" |perl -pe 's/\n/ /g' | perl -pe 's/\t/ /g'` csrf_token=$(echo $ab |cut -d " " -f 7) splunkd_8000=$(echo $ab | cut -d " " -f 14) echo "splunkweb_csrf_token=$csrf_token" echo "splunkd_8000=$splunkd_8000" # Once cookies ready, then fill headers for command #headers = { #Cookie: splunkd_8000=<splunkd_cookie>;splunkweb_csrf_token_8000=<csrf_token>, #Content-type: application/json, #X-Requested-With: XMLHttpRequest, #X-Splunk-Form-Key: <csrf_token> <<< csrf this appears for POST only.. #} # Example: curl -c - -k http://splunk:8000/en-US/splunkd/__raw/servicesNS/-/-/saved/searches/ -H "Cookie: cval=372560337;splunkweb_csrf_token_8000=1324774297983139238;splunkd_8000=xuqLdlcjgtNm77umvfv6WZvJnX^WbTGvi2f2XbBMhoHe3nsshq_rGa6_Rknw06XThwCvML2VLuyQhTuhJJsFyx8TRAHi7RC17Up56IkluUmQVCLj9R4uZl9OyNP9Z7qBhIr" -X GET -H "X-Splunk-Form-Key: 1324774297983139238" -H "X-Requested-With: XMLHttpRequest" -H "Content-type: application/json"

sylim_splunk · ‎09-08-2022

The issues with HF and DC appear to be different ones. - The HF connectivity, it's connected but UF is not able to send data any more as the 10.236.65.143 is not accepting connection. The HF must be full in data processing queues all the way to the tcpoutput to another indexer(s). How to work around/make it better - if you have many HFs then try to configure asynchronous forwarding. Here's the info for your reference: https://www.linkedin.com/pulse/splunk-asynchronous-forwarding-lightning-fast-data-ingestor-rawat/?trk=public_profile_article_view - The DC connection issue, firstly check the configuration or add https:// to targetUri if it doesn't have it yet. Try to capture tcpdump and see why it fails - if it's by client side or by the DS. Or DNS failure..

sylim_splunk · ‎09-08-2022

Here's the steps to find the culprits. There are 17 crash logs found in the diag file. The crashing point from crash log files are all the same as below, ---- excerpts --- Libc abort message: splunkd: /opt/splunk/src/pipeline/indexer/JournalSlice.cpp:1780: bool ReadableJournalSliceDirectory::findEventTimeRange(st_time_t*, st_time_t*, bool): Assertion `tell() == pos' failed. ------------------ Which suggests that there must be corrupt bucket(s) or truncated but the crash log files don't have details of which bucket(s) being corrupt at all. To find the relevant logs in splunkd.log by matching the time, 14:45:24 and thread number such as, IndexerTPoolWorker-4 from the crash log file , I found the logs as below; - For crash-2022-09-07-14:45:24.log: Crashing thread: IndexerTPoolWorker-4 Check the splunkd.log to find the logs created at 14:45:24 by IndexerTPoolWorker-4 ; splunkd.log: 09-07-2022 14:45:24.025 -0700 INFO HotBucketRoller [322700 IndexerTPoolWorker-4] - found hot bucket='/storage/tier1/20000_idx2/db/hot_v1_432' - For crash-2022-09-07-14:45:32.log: Crashing thread: IndexerTPoolWorker-7 Check splunkd.log to find the logs created at 14:45:32 by IndexerTPoolWorker-7 ; splunkd.log: 09-07-2022 14:45:32.238 -0700 INFO HotBucketRoller [322983 IndexerTPoolWorker-7] - found hot bucket='/storage/tier1/20000_idx2/db/hot_v1_432' - crash-2022-09-07-14:45:40.log: Crashing thread: IndexerTPoolWorker-6 splunkd.log: 09-07-2022 14:45:40.566 -0700 INFO HotBucketRoller [323220 IndexerTPoolWorker-6] - found hot bucket='/storage/tier1/20000_idx2/db/hot_v1_432' The crash log time and the thread number, IndexerTPoolWorker-# matches and the log messages pointing to the same bucket, '/storage/tier1/20000_idx2/db/hot_v1_432'. As it's suspected to have gone beyond the fsck repair we moved it out of the $SPLUNK_DB and then the indexer came up and running as the other peers. This covers the specific situation of a corrupt bucket causing the indexer to crash and how to find the corrupt buckets.

sylim_splunk · ‎09-08-2022

Indexer rebooted no-gracefully. After reboot Splunk starts generating crash files shortly after restart. Spent the last two days running fsck repair on all buckets. doesn't seem to have helped. No relevant errors in the splunkd.log. Crash log files: crash-2022-09-07-14:45:07.log crash-2022-09-07-14:45:15.log crash-2022-09-07-14:45:24.log crash-2022-09-07-14:45:32.log crash-2022-09-07-14:45:40.log Crash log: every crash log has the same patterns as below by changing Crashing threads, such as IndexerTPoolWorker-2, IndexerTPoolWorker-4, IndexerTPoolWorker-7 and the like.. [build 87344edfcdb4] 2022-09-07 13:31:01 Received fatal signal 6 (Aborted) on PID 193171. Cause: Signal sent by PID 193171 running under UID 53292. Crashing thread: IndexerTPoolWorker-2 Backtrace (PIC build): [0x00007EFDA0A27387] gsignal + 55 (libc.so.6 + 0x36387) [0x00007EFDA0A28A78] abort + 328 (libc.so.6 + 0x37A78) [0x00007EFDA0A201A6] ? (libc.so.6 + 0x2F1A6) [0x00007EFDA0A20252] ? (libc.so.6 + 0x2F252) [0x000056097778BA2C] ReadableJournalSliceDirectory::findEventTimeRange(int*, int*, bool) ... Libc abort message: splunkd: /opt/splunk/src/pipeline/indexer/JournalSlice.cpp:1780: bool ReadableJournalSliceDirectory::findEventTimeRange(st_time_t*, st_time_t*, bool): Assertion `tell() == pos' failed.

Posts	287
Solutions	111
Karma Given	150
Karma Received	316
Member Since	‎09-29-2013

Online Status	Offline
Date Last Visited	3 weeks ago

[9.0.5] ui-prefs.conf, Why my default search mode ...

How to fix this error: The search job "SID" was ca...

We set up a new role, however users with the role ...

[9.0.3+] Invalid file path /proc/1/cgroup while ch...

[HEC] SHIFT-JIS Character recognition

Replication errors : having artifact replication i...

High memory usage by splunk-MonitorNoHandle.exe

How to get mandatory cookies to access Web UI?

Why does my indexer keep on crashing- IndexerTPool...

Unable to delete search results

Re: Splunkforwarder Startup Error in Docker contai...

Re: After upgrade to 9.1.2 all users try to execut...

Re: Deployment Server Upgrade 9.2.1 and Server Cla...

Re: [9.0.5] My default search mode in search page ...

[9.0.5] ui-prefs.conf, Why my default search mode ...

Re: The search "SID" was canceled remotely or expi...

How to fix this error: The search job "SID" was ca...

Re: We set up a new role, however users with the r...

We set up a new role, however users with the role ...

Re: [9.0.3+] Invalid file path /proc/1/cgroup whil...

[9.0.3+] Invalid file path /proc/1/cgroup while ch...

Re: [HEC] SHIFT-JIS Character recognition

[HEC] SHIFT-JIS Character recognition

Re: Replication errors : having artifact replicati...

Replication errors : having artifact replication i...

Re: High memory usage by splunk-MonitorNoHandle.ex...

High memory usage by splunk-MonitorNoHandle.exe

Re: UI fails to find splunk Knowledge Objects, suc...

Re: Process %_Processor_Time greater than 100%

Re: Can I delete the DB data files in the search h...

Re: Can I delete the DB data files in the search h...

How to get mandatory cookies to access Web UI?

Re: Why is my UF not connecting to HF and DS?

Re: Indexer keeps on crashing thread IndexerTPoolW...

Why does my indexer keep on crashing- IndexerTPool...

Are you a member of the Splunk Community?