Installation

Splunkd Crash Every Ten Minutes

gearmstrong
Path Finder

Good day,

Just installed Splunk Enterprise 8.1.0.1 onto New (all-in-one) (moved drive from old system to new) Windows DataCenter Server 2019 Core over top of 8.0.0.  Initially got some Splunkd Crashes related to incompatible Python2 code in Apps that prevented splund from starting.  Disabled offending apps from command-line and... all good.   New System is 56GB memory, 16 vCPUs.  I have two indexes (colddb paths) set to F: and G: while Warm/Hot indexes still exist on E: drives.

Today I noticed that splunkd is still crashing but on a different thread from the one that halted splunkd.  

     Access violation, cannot read at address [0x000002EC99CBE6F8] Exception address:          [0x00007FF65CF8EAB6] Crashing thread: BucketSummaryActorThread

The only 'direct' reference I found to this was for Transparent huge memory pages (THP)  being disabled on a Linux System but nothing relevant to a Windows 2019 Core system.  I'm sure there are some parallels here between memory management on Linux and Core but I don't see them.

Any assistance is appreciated.

Best regards,

Greg



Labels (3)
0 Karma
1 Solution

jbjerke_splunk
Splunk Employee
Splunk Employee

Hi Greg

 

Sorry about the issues.

I've updated the codebase now as I managed to reproduce the error. I've just published a new release to Splunkbase (2.2.5) although it is not vetted for Splunk Cloud yet, this might take a few days.
 
There were some Python3 compatibility issues that were so minor they went through the basic checks. Unfortunately it meant the datamodel wouldn't build every 10 minutes and that job crashed. 
 
Let me know if this works for you.
 
 
Kind regards
 
Johan

View solution in original post

0 Karma

gearmstrong
Path Finder

I have verified that the Secpol does not have "Lock Pages in Memory" enabled on this system.

0 Karma

gearmstrong
Path Finder

May have Identified the source of the crash as SplunkAppForWebAnalytics Acceleration Job that 'fires' every ten minutes crashing the "BucketSummaryActorThread" but now need to figure out how to address it.

ERROR SavedSplunker - savedsearch_id="nobody;SplunkAppForWebAnalytics;_ACCELERATE_DM_SplunkAppForWebAnalytics_Web_ACCELERATE_", message="The search job

0 Karma

gearmstrong
Path Finder

Found where similar issue was fixed in 8.0.2....  Can this be similar in any way to previous bug-fixes?

"https://docs.splunk.com/Documentation/Splunk/8.0.2/ReleaseNotes/Fixedissues","Search issues","2020-01-06","SPL-180268, SPL-177675","Crash in BucketSummaryActorThread for a specific summary directory, persists after removing"

0 Karma

gearmstrong
Path Finder

Ok... I have finally isolated the individual Job that is Crashing the thread.  I would like to disable a scheduled job under Web Analytics called "Generate Goal summary - Scheduled"  as it is the culprit.  I ran it manually and it crashed.  I checked under Goals and do not see any configured and the WA_Goas.csv lookup is empty.  Under "Goals" "Setup" I do not see any Goals configured for any of our sites so I don't think it is used.  

Here is the code for that job.... in case any of you can figure out if it is a bug with this App and 8.1.0.1 or if it is just bad code!

| inputlookup WA_goals
| map maxsearches=10000 search="search tag=web eventtype=pageview site=\"$site$\"
| eval goal_id=\"$goal_id$\"
| eval goal_start=\"$start$\"
| eval goal_end=\"$end$\"
| rex field=goal_start mode=sed \"s/\*/%/g\"
| rex field=goal_end mode=sed \"s/\*/%/g\"
| lookup WA_sessions user AS user OUTPUT http_session,http_session_start,http_session_end,http_session_pageviews,http_session_duration,http_referer,http_referer_domain AS http_session_referrer_domain,http_referer_hostname AS http_session_referrer_hostname,http_session_channel
| where isnotnull(http_session)
| lookup user_agents http_user_agent
| eval ua_mobile=if(eventtype==\"ua-mobile\",'ua_device', \"\")
| bucket _time span=10m
| stats dc(http_session) AS Sessions,dc(user) AS Users,count(eval(like(uri,goal_start))) AS Entries,count(eval(like(uri,goal_end))) AS Completed by _time, site, goal_id, http_session_channel, http_session_referrer_domain,ua_family,ua_mobile,ua_os_family,goal_id
| eval goal_start=\"$start$\"
| eval goal_end=\"$end$\"
| collect index=goal_summary
"

0 Karma

gearmstrong
Path Finder

Job was just a coincidence in scheduling.  Crash still occurring.  See that it matches the Web Data Model Acceleration Cron Schedule.  Decided to try to perform a rebuild but crash still persists.  Does anyone have any other suggestions.... short of not using the WebAnalytics App?  This is a 'no-go' as customer uses this very extensively.

 

Configuration Settings

These settings can be changed by going to actions and selecting edit acceleration. Learn More

allow_old_summaries = true

allow_skew = 0

backfill_time = -

cron_schedule = 2,12,22,32,42,52 * * * *

earliest_time = -3mon

hunk.compression_codec = -

hunk.dfs_block_size = 0

hunk.file_format = -

manual_rebuilds = true

max_concurrent = 6

max_time = 3600

poll_buckets_until_maxtime = false

schedule_priority = highest

workload_pool = -

0 Karma

jbjerke_splunk
Splunk Employee
Splunk Employee

Hi Greg

 

Sorry about the issues.

I've updated the codebase now as I managed to reproduce the error. I've just published a new release to Splunkbase (2.2.5) although it is not vetted for Splunk Cloud yet, this might take a few days.
 
There were some Python3 compatibility issues that were so minor they went through the basic checks. Unfortunately it meant the datamodel wouldn't build every 10 minutes and that job crashed. 
 
Let me know if this works for you.
 
 
Kind regards
 
Johan
0 Karma

gearmstrong
Path Finder

Thank you so much.  Upgrading to 2.2.5 resolved the issue.

0 Karma
Get Updates on the Splunk Community!

Now Available: Cisco Talos Threat Intelligence Integrations for Splunk Security Cloud ...

At .conf24, we shared that we were in the process of integrating Cisco Talos threat intelligence into Splunk ...

Preparing your Splunk Environment for OpenSSL3

The Splunk platform will transition to OpenSSL version 3 in a future release. Actions are required to prepare ...

Easily Improve Agent Saturation with the Splunk Add-on for OpenTelemetry Collector

Agent Saturation What and Whys In application performance monitoring, saturation is defined as the total load ...