Good day,
Just installed Splunk Enterprise 8.1.0.1 onto New (all-in-one) (moved drive from old system to new) Windows DataCenter Server 2019 Core over top of 8.0.0. Initially got some Splunkd Crashes related to incompatible Python2 code in Apps that prevented splund from starting. Disabled offending apps from command-line and... all good. New System is 56GB memory, 16 vCPUs. I have two indexes (colddb paths) set to F: and G: while Warm/Hot indexes still exist on E: drives.
Today I noticed that splunkd is still crashing but on a different thread from the one that halted splunkd.
Access violation, cannot read at address [0x000002EC99CBE6F8] Exception address: [0x00007FF65CF8EAB6] Crashing thread: BucketSummaryActorThread
The only 'direct' reference I found to this was for Transparent huge memory pages (THP) being disabled on a Linux System but nothing relevant to a Windows 2019 Core system. I'm sure there are some parallels here between memory management on Linux and Core but I don't see them.
Any assistance is appreciated.
Best regards,
Greg
Hi Greg
Sorry about the issues.
I have verified that the Secpol does not have "Lock Pages in Memory" enabled on this system.
May have Identified the source of the crash as SplunkAppForWebAnalytics Acceleration Job that 'fires' every ten minutes crashing the "BucketSummaryActorThread" but now need to figure out how to address it.
ERROR SavedSplunker - savedsearch_id="nobody;SplunkAppForWebAnalytics;_ACCELERATE_DM_SplunkAppForWebAnalytics_Web_ACCELERATE_", message="The search job
Found where similar issue was fixed in 8.0.2.... Can this be similar in any way to previous bug-fixes?
"https://docs.splunk.com/Documentation/Splunk/8.0.2/ReleaseNotes/Fixedissues","Search issues","2020-01-06","SPL-180268, SPL-177675","Crash in BucketSummaryActorThread for a specific summary directory, persists after removing"
Ok... I have finally isolated the individual Job that is Crashing the thread. I would like to disable a scheduled job under Web Analytics called "Generate Goal summary - Scheduled" as it is the culprit. I ran it manually and it crashed. I checked under Goals and do not see any configured and the WA_Goas.csv lookup is empty. Under "Goals" "Setup" I do not see any Goals configured for any of our sites so I don't think it is used.
Here is the code for that job.... in case any of you can figure out if it is a bug with this App and 8.1.0.1 or if it is just bad code!
| inputlookup WA_goals
| map maxsearches=10000 search="search tag=web eventtype=pageview site=\"$site$\"
| eval goal_id=\"$goal_id$\"
| eval goal_start=\"$start$\"
| eval goal_end=\"$end$\"
| rex field=goal_start mode=sed \"s/\*/%/g\"
| rex field=goal_end mode=sed \"s/\*/%/g\"
| lookup WA_sessions user AS user OUTPUT http_session,http_session_start,http_session_end,http_session_pageviews,http_session_duration,http_referer,http_referer_domain AS http_session_referrer_domain,http_referer_hostname AS http_session_referrer_hostname,http_session_channel
| where isnotnull(http_session)
| lookup user_agents http_user_agent
| eval ua_mobile=if(eventtype==\"ua-mobile\",'ua_device', \"\")
| bucket _time span=10m
| stats dc(http_session) AS Sessions,dc(user) AS Users,count(eval(like(uri,goal_start))) AS Entries,count(eval(like(uri,goal_end))) AS Completed by _time, site, goal_id, http_session_channel, http_session_referrer_domain,ua_family,ua_mobile,ua_os_family,goal_id
| eval goal_start=\"$start$\"
| eval goal_end=\"$end$\"
| collect index=goal_summary
"
Job was just a coincidence in scheduling. Crash still occurring. See that it matches the Web Data Model Acceleration Cron Schedule. Decided to try to perform a rebuild but crash still persists. Does anyone have any other suggestions.... short of not using the WebAnalytics App? This is a 'no-go' as customer uses this very extensively.
Configuration Settings
These settings can be changed by going to actions and selecting edit acceleration. Learn More
allow_old_summaries = true
allow_skew = 0
backfill_time = -
cron_schedule = 2,12,22,32,42,52 * * * *
earliest_time = -3mon
hunk.compression_codec = -
hunk.dfs_block_size = 0
hunk.file_format = -
manual_rebuilds = true
max_concurrent = 6
max_time = 3600
poll_buckets_until_maxtime = false
schedule_priority = highest
workload_pool = -
Hi Greg
Sorry about the issues.
Thank you so much. Upgrading to 2.2.5 resolved the issue.