Installation

Splunkd Crash Every Ten Minutes

gearmstrong
Path Finder

Good day,

Just installed Splunk Enterprise 8.1.0.1 onto New (all-in-one) (moved drive from old system to new) Windows DataCenter Server 2019 Core over top of 8.0.0.  Initially got some Splunkd Crashes related to incompatible Python2 code in Apps that prevented splund from starting.  Disabled offending apps from command-line and... all good.   New System is 56GB memory, 16 vCPUs.  I have two indexes (colddb paths) set to F: and G: while Warm/Hot indexes still exist on E: drives.

Today I noticed that splunkd is still crashing but on a different thread from the one that halted splunkd.  

     Access violation, cannot read at address [0x000002EC99CBE6F8] Exception address:          [0x00007FF65CF8EAB6] Crashing thread: BucketSummaryActorThread

The only 'direct' reference I found to this was for Transparent huge memory pages (THP)  being disabled on a Linux System but nothing relevant to a Windows 2019 Core system.  I'm sure there are some parallels here between memory management on Linux and Core but I don't see them.

Any assistance is appreciated.

Best regards,

Greg



Labels (3)
0 Karma
1 Solution

jbjerke_splunk
Splunk Employee
Splunk Employee

Hi Greg

 

Sorry about the issues.

I've updated the codebase now as I managed to reproduce the error. I've just published a new release to Splunkbase (2.2.5) although it is not vetted for Splunk Cloud yet, this might take a few days.
 
There were some Python3 compatibility issues that were so minor they went through the basic checks. Unfortunately it meant the datamodel wouldn't build every 10 minutes and that job crashed. 
 
Let me know if this works for you.
 
 
Kind regards
 
Johan

View solution in original post

0 Karma

gearmstrong
Path Finder

I have verified that the Secpol does not have "Lock Pages in Memory" enabled on this system.

0 Karma

gearmstrong
Path Finder

May have Identified the source of the crash as SplunkAppForWebAnalytics Acceleration Job that 'fires' every ten minutes crashing the "BucketSummaryActorThread" but now need to figure out how to address it.

ERROR SavedSplunker - savedsearch_id="nobody;SplunkAppForWebAnalytics;_ACCELERATE_DM_SplunkAppForWebAnalytics_Web_ACCELERATE_", message="The search job

0 Karma

gearmstrong
Path Finder

Found where similar issue was fixed in 8.0.2....  Can this be similar in any way to previous bug-fixes?

"https://docs.splunk.com/Documentation/Splunk/8.0.2/ReleaseNotes/Fixedissues","Search issues","2020-01-06","SPL-180268, SPL-177675","Crash in BucketSummaryActorThread for a specific summary directory, persists after removing"

0 Karma

gearmstrong
Path Finder

Ok... I have finally isolated the individual Job that is Crashing the thread.  I would like to disable a scheduled job under Web Analytics called "Generate Goal summary - Scheduled"  as it is the culprit.  I ran it manually and it crashed.  I checked under Goals and do not see any configured and the WA_Goas.csv lookup is empty.  Under "Goals" "Setup" I do not see any Goals configured for any of our sites so I don't think it is used.  

Here is the code for that job.... in case any of you can figure out if it is a bug with this App and 8.1.0.1 or if it is just bad code!

| inputlookup WA_goals
| map maxsearches=10000 search="search tag=web eventtype=pageview site=\"$site$\"
| eval goal_id=\"$goal_id$\"
| eval goal_start=\"$start$\"
| eval goal_end=\"$end$\"
| rex field=goal_start mode=sed \"s/\*/%/g\"
| rex field=goal_end mode=sed \"s/\*/%/g\"
| lookup WA_sessions user AS user OUTPUT http_session,http_session_start,http_session_end,http_session_pageviews,http_session_duration,http_referer,http_referer_domain AS http_session_referrer_domain,http_referer_hostname AS http_session_referrer_hostname,http_session_channel
| where isnotnull(http_session)
| lookup user_agents http_user_agent
| eval ua_mobile=if(eventtype==\"ua-mobile\",'ua_device', \"\")
| bucket _time span=10m
| stats dc(http_session) AS Sessions,dc(user) AS Users,count(eval(like(uri,goal_start))) AS Entries,count(eval(like(uri,goal_end))) AS Completed by _time, site, goal_id, http_session_channel, http_session_referrer_domain,ua_family,ua_mobile,ua_os_family,goal_id
| eval goal_start=\"$start$\"
| eval goal_end=\"$end$\"
| collect index=goal_summary
"

0 Karma

gearmstrong
Path Finder

Job was just a coincidence in scheduling.  Crash still occurring.  See that it matches the Web Data Model Acceleration Cron Schedule.  Decided to try to perform a rebuild but crash still persists.  Does anyone have any other suggestions.... short of not using the WebAnalytics App?  This is a 'no-go' as customer uses this very extensively.

 

Configuration Settings

These settings can be changed by going to actions and selecting edit acceleration. Learn More

allow_old_summaries = true

allow_skew = 0

backfill_time = -

cron_schedule = 2,12,22,32,42,52 * * * *

earliest_time = -3mon

hunk.compression_codec = -

hunk.dfs_block_size = 0

hunk.file_format = -

manual_rebuilds = true

max_concurrent = 6

max_time = 3600

poll_buckets_until_maxtime = false

schedule_priority = highest

workload_pool = -

0 Karma

jbjerke_splunk
Splunk Employee
Splunk Employee

Hi Greg

 

Sorry about the issues.

I've updated the codebase now as I managed to reproduce the error. I've just published a new release to Splunkbase (2.2.5) although it is not vetted for Splunk Cloud yet, this might take a few days.
 
There were some Python3 compatibility issues that were so minor they went through the basic checks. Unfortunately it meant the datamodel wouldn't build every 10 minutes and that job crashed. 
 
Let me know if this works for you.
 
 
Kind regards
 
Johan
0 Karma

gearmstrong
Path Finder

Thank you so much.  Upgrading to 2.2.5 resolved the issue.

0 Karma
Get Updates on the Splunk Community!

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

Splunk is officially part of Cisco

Revolutionizing how our customers build resilience across their entire digital footprint.   Splunk ...

Splunk APM & RUM | Planned Maintenance March 26 - March 28, 2024

There will be planned maintenance for Splunk APM and RUM between March 26, 2024 and March 28, 2024 as ...