All Posts

Find Answers
Ask questions. Get answers. Find technical product solutions from passionate members of the Splunk community.

All Posts

Hi @Uday, There are several approaches to create a server status dashboard in Splunk when you don't have explicit "server up/down" logs. Here are the most effective methods: ## Method 1: Check ... See more...
Hi @Uday, There are several approaches to create a server status dashboard in Splunk when you don't have explicit "server up/down" logs. Here are the most effective methods: ## Method 1: Check for Recent Log Activity This is the simplest approach - if a server is sending logs, it's probably up: ``` | metadata type=hosts index=* | search host=* | eval lastTime=strftime(recentTime,"%Y-%m-%d %H:%M:%S") | eval status=if(now()-recentTime < 600, "UP", "DOWN") | table host lastTime status | sort host ``` Customize the time threshold (600 seconds = 10 minutes) based on your expected log frequency. ## Method 2: Using Rangemap for Visualization Use rangemap to assign colors to status values: ``` | metadata type=hosts index=* | search host=* | eval lastTime=strftime(recentTime,"%Y-%m-%d %H:%M:%S") | eval seconds_since_last_log=now()-recentTime | eval status=if(seconds_since_last_log < 600, "UP", "DOWN") | rangemap field=status up="0-0" down="1-1" | table host lastTime status range | sort host ``` For dashboard visualization, you'll need to add: 1. A CSS file (table_decorations.css) with content: ```css .severe { background-color: #dc4e41 !important; color: white !important; } .low { background-color: #65a637 !important; color: white !important; } ``` 2. A JavaScript file (table_icons_rangemap.js) with content: ```javascript require([ 'underscore', 'jquery', 'splunkjs/mvc', 'splunkjs/mvc/tableview', 'splunkjs/mvc/simplexml/ready!' ], function(_, $, mvc, TableView) { var CustomRangeRenderer = TableView.BaseCellRenderer.extend({ canRender: function(cell) { return cell.field === 'range'; }, render: function($td, cell) { var value = cell.value; if(value === "severe") { $td.addClass('severe'); $td.html('Down'); } else if(value === "low") { $td.addClass('low'); $td.html('Up'); } return $td; } }); mvc.Components.get('table1').getVisualization(function(tableView) { tableView.addCellRenderer(new CustomRangeRenderer()); tableView.render(); }); }); ``` 3. Dashboard XML that includes these files: ```xml <form script="table_icons_rangemap.js" stylesheet="table_decorations.css"> <label>Server Status Dashboard</label> <fieldset submitButton="false"> <input type="time" token="field1"> <label></label> <default> <earliest>-60m@m</earliest> <latest>now</latest> </default> </input> </fieldset> <row> <panel> <table id="table1"> <search> <query>| metadata type=hosts index=* | search host=* | eval lastTime=strftime(recentTime,"%Y-%m-%d %H:%M:%S") | eval seconds_since_last_log=now()-recentTime | eval status=if(seconds_since_last_log < 600, "UP", "DOWN") | rangemap field=status up="0-0" down="1-1" | table host lastTime status range | sort host</query> <earliest>$field1.earliest$</earliest> <latest>$field1.latest$</latest> </search> <option name="drilldown">none</option> </table> </panel> </row> </form> ``` ## Method 3: Include All Expected Servers To also show servers that aren't sending logs at all, use a lookup with all expected servers: ``` | inputlookup your_servers.csv | append [| metadata type=hosts index=*] | stats max(recentTime) as recentTime by host | eval lastTime=if(isnotnull(recentTime),strftime(recentTime,"%Y-%m-%d %H:%M:%S"),"Never") | eval seconds_since_last_log=if(isnotnull(recentTime),now()-recentTime,999999) | eval status=if(seconds_since_last_log < 600, "UP", "DOWN") | rangemap field=status up="0-0" down="1-1" | table host lastTime status range | sort host ``` ## Method 4: Advanced Server Status Check (Recommended for Critical Systems) If exact server status is critical, create a scheduled search that sends heartbeats from each server and alerts when they're missing: 1. Create a small script on each server that sends a heartbeat every few minutes: ``` index=server_status sourcetype=heartbeat host=$HOSTNAME$ status=ALIVE ``` 2. Then use this search for your dashboard: ``` | inputlookup your_servers.csv | map search="search earliest=-10m latest=now index=server_status sourcetype=heartbeat host=$host$ | head 1 | fields host" | fillnull value="DOWN" status | eval status=if(host=="NULL","DOWN","UP") | rangemap field=status up="0-0" down="1-1" | table host status range ``` This solution is more accurate than just checking for any logs, as it specifically monitors for heartbeat messages. Remember to place your CSS and JS files in the /appserver/static/ directory of your app, and restart Splunk after adding them. Please give for support happly splunking ....
Hi @genesiusj, Based on your description, you're dealing with a time series forecasting problem where you want to predict future user access patterns on Sundays. For this type of scenario in MLT... See more...
Hi @genesiusj, Based on your description, you're dealing with a time series forecasting problem where you want to predict future user access patterns on Sundays. For this type of scenario in MLTK, I would recommend the following algorithms: ## Recommended Algorithms 1. Prophet a. Excellent for time series data with strong seasonal patterns (like your Sunday-only data) b. Handles missing values well, which is useful since many users may have zero counts on certain days c. Can capture multiple seasonal patterns (weekly, monthly, yearly) d. Works well when you have 6 months of historical data 2. ARIMA (AutoRegressive Integrated Moving Average) a. Good for detecting patterns and generating forecasts based on historical values b. Works well for data that shows trends over time c. Can handle seasonal patterns with the seasonal variant (SARIMA) d. Requires stationary data (you might need to difference your time series) ## Implementation Approach For your specific use case with 1000 users, I would recommend using a separate model for each user who has sufficient historical data. Here's how you could implement this with Prophet: ``` | inputlookup your_lookup.csv | where DATE >= "2020-01-05" AND DATE <= "2020-06-28" | rename DATE as ds, COUNT as y | fit Prophet future_timespan=26 from ds y by USER | where isnull(y) | eval date_str=strftime(ds, "%Y-%m-%d") | rename ds as DATE | fields DATE USER yhat yhat_lower yhat_upper | eval predicted_count = round(yhat) | fields DATE USER predicted_count ``` For comparison with actual values: ``` | inputlookup your_lookup.csv | where DATE >= "2020-07-05" AND DATE <= "2020-12-27" | join type=left USER DATE [| inputlookup your_lookup.csv | where DATE >= "2020-01-05" AND DATE <= "2020-06-28" | rename DATE as ds, COUNT as y | fit Prophet future_timespan=26 from ds y by USER | where isnull(y) | eval DATE=strftime(ds, "%Y-%m-%d") | fields DATE USER yhat | eval predicted_count = round(yhat)] | rename COUNT as actual_count | eval error = abs(actual_count - predicted_count) | eval error_percentage = if(actual_count=0, if(predicted_count=0, 0, 100), round((error/actual_count)*100, 2)) ``` ## Handling Your Data Structure Since you have 1000 users and 52 Sundays, I have a few recommendations for improving your forecasting: 1. Focus first on users with non-zero access patterns a. Many users might have sparse or no access attempts, which can result in poor models b. Consider filtering to users who accessed the system at least N times during the training period 2. Consider feature engineering a. Add month and quarter features to help the model capture broader seasonal patterns b. Include special event indicators if certain Sundays might have unusual patterns (holidays, etc.) c. You might want to include a lag feature (access count from previous Sunday) 3. Model evaluation a. Compare MAPE (Mean Absolute Percentage Error) across different users and algorithms b. For users with sparse access patterns, consider MAE (Mean Absolute Error) instead c. Establish a baseline model (like average access count per Sunday) to compare against 4. Alternative approach for sparse data a. For users with very sparse access patterns, consider binary classification b. Predict whether a user will attempt access (yes/no) rather than count c. Use algorithms like Logistic Regression or Random Forest for this approach Hope this helps point you in the right direction! With 6 months of training data focused on weekly patterns, Prophet is likely your best starting point. Please give for support happly splunking ....
How big and complex your dataset is and how much its content is changing? And how long time span it covers?
Hi @token2, To answer your questions about the VMware add-ons: ## Do You Need Both Add-ons? No, you don't necessarily need both add-ons in an environment with vCenter, but using both provides ... See more...
Hi @token2, To answer your questions about the VMware add-ons: ## Do You Need Both Add-ons? No, you don't necessarily need both add-ons in an environment with vCenter, but using both provides more complete visibility. Here's why: 1. Splunk Add-on for vCenter: a. Collects vCenter-specific logs and metrics b. Gathers performance data through the vCenter API c. Collects vCenter Server events, tasks, and alarms d. Can collect some forwarded ESXi logs that vCenter has received (if configured to do so) 2. Splunk Add-on for VMware ESXi: a. Collects ESXi host-specific logs directly from each host b. Captures detailed host-level events that may not all be forwarded to vCenter c. Provides more granular host-level monitoring d. Essential for troubleshooting host-specific issues While vCenter does collect many ESXi logs, it doesn't necessarily collect everything. Some detailed ESXi logs remain local to the hosts and aren't forwarded to vCenter, especially debug-level logs and certain system events. Collecting directly from ESXi hosts gives you more complete visibility. ## Collection Methods for vCenter Add-on Yes, the Splunk Add-on for vCenter typically utilizes both collection methods: 1. Syslog collection: a. For operational logs and events from vCenter b. Requires configuring vCenter to forward logs via syslog 2. API access: a. For performance metrics, inventory, and task/event data b. Requires a vCenter user account with appropriate permissions c. Uses REST API calls to gather data This dual-collection approach gives you both operational logs and rich performance/configuration data. ## Recommended Setup For a complete VMware monitoring solution with vCenter: 1. If complete visibility is important: a. Install both add-ons b. Configure syslog from both vCenter and all ESXi hosts c. Set up API collection from vCenter 2. If you have resource constraints or simpler needs: a. Install the vCenter add-on only b. Ensure vCenter is configured to collect as many ESXi logs as possible c. You'll miss some host-specific details but will have good overall visibility 3. If you have a very large environment: a. Install both add-ons b. Consider selective monitoring of critical ESXi hosts only c. Use the vCenter add-on for broad monitoring and the ESXi add-on for deep dive into important hosts The biggest advantage of using both add-ons is the additional context and detail you get from direct ESXi host monitoring, especially valuable for troubleshooting host-specific issues that might not be fully visible through vCenter alone. Hope this helps clarify your VMware monitoring options in Splunk! Please give for support happly splunking ....
Maybe I can make a new data set which is the original data set minus duplicate identical log lines. Are there any tutorials for this? I am newbie super user for just for some reports. I hate this role.
Hi @asah, No, the traditional Splunk Deployment Server cannot be used to manage Splunk OpenTelemetry (OTel) Collectors running in Kubernetes clusters. Here's why and what alternatives you should... See more...
Hi @asah, No, the traditional Splunk Deployment Server cannot be used to manage Splunk OpenTelemetry (OTel) Collectors running in Kubernetes clusters. Here's why and what alternatives you should consider: ## Why Deployment Server Won't Work 1. **Different Architecture**: Splunk Deployment Server is designed to manage Splunk-specific components like Universal Forwarders and Heavy Forwarders, which use Splunk's proprietary configuration system. The OpenTelemetry Collector uses a completely different configuration approach. 2. **Kubernetes-Native Components**: OTel Collectors running in Kubernetes are typically deployed as Kubernetes resources (Deployments, DaemonSets, etc.) and follow Kubernetes configuration patterns using ConfigMaps or Secrets. 3. **Configuration Format**: OTel Collectors use YAML configurations with a specific schema that's different from Splunk's .conf files. ## Recommended Approaches for Managing OTel Collectors in Kubernetes ### 1. GitOps Workflow (Recommended) Use a GitOps approach with tools like: - Flux or ArgoCD for configuration management - Store your OTel configurations in a Git repository - Use Kubernetes ConfigMaps to mount configurations into your collectors ### 2. Helm Charts The Splunk OpenTelemetry Collector Helm chart provides a manageable way to deploy and configure collectors: ```bash helm repo add splunk-otel-collector-chart https://signalfx.github.io/splunk-otel-collector-chart helm install my-splunk-otel splunk-otel-collector-chart/splunk-otel-collector \ --set gateway.enabled=true \ --set clusterName=my-cluster ``` You can create custom values.yaml files for different environments and manage them in your version control system. ### 3. Kubernetes Operator For more sophisticated management, consider using an operator pattern. While there isn't an official OTel operator from Splunk yet, you could implement your own custom operator or use community-developed options. ### 4. Configuration Management Tools Use standard configuration management tools like: - Ansible - Terraform - Puppet/Chef These can apply configuration changes across your Kubernetes clusters in a controlled manner. ## Practical Example Here's a simplified workflow for managing OTel configurations in Kubernetes: 1. Store your base collector config in a Git repo: ```yaml # otel-collector-config.yaml receivers: filelog: include: [/var/log/containers/*.log] processors: batch: timeout: 1s exporters: splunk_hec: token: "${SPLUNK_HEC_TOKEN}" endpoint: "https://your-splunk-cloud-instance.splunkcloud.com:8088" service: pipelines: logs: receivers: [filelog] processors: [batch] exporters: [splunk_hec] ``` 2. Create a ConfigMap in Kubernetes: ```yaml apiVersion: v1 kind: ConfigMap metadata: name: otel-collector-config namespace: splunk data: collector.yaml: | receivers: filelog: include: [/var/log/containers/*.log] # Rest of config... ``` 3. Mount the ConfigMap in your OTel Collector deployment: ```yaml apiVersion: apps/v1 kind: Deployment metadata: name: otel-collector spec: template: spec: containers: - name: otel-collector image: otel/opentelemetry-collector-contrib:latest volumeMounts: - name: config mountPath: /etc/otel/config.yaml subPath: collector.yaml volumes: - name: config configMap: name: otel-collector-config ``` This approach lets you manage configurations in a Kubernetes-native way, with proper version control and rollout strategies. For more information, I recommend checking the official documentation: - [Splunk OpenTelemetry Collector for Kubernetes](https://github.com/signalfx/splunk-otel-collector-chart) - [OpenTelemetry Collector Configuration](https://opentelemetry.io/docs/collector/configuration/) Please give for support happly splunking ....
Hi @Cbr1sg ,   Based on the thread and my experience with this issue, the Microsoft Teams Add-on for Splunk has a known deficiency in handling 404 errors properly. Here's what's happening: 1. When... See more...
Hi @Cbr1sg ,   Based on the thread and my experience with this issue, the Microsoft Teams Add-on for Splunk has a known deficiency in handling 404 errors properly. Here's what's happening: 1. When the add-on encounters call IDs that no longer exist in Microsoft Teams (returning 404 errors), it fails to remove these IDs from the webhook directory. 2. This causes a build-up of unprocessable call IDs, leading to: a. Continuous error messages in the logs b. Eventually a "401 Unauthorized" error when too many files accumulate (~60K files) c. The add-on completely stops working until restarted The most reliable fix I've found is the following procedure that needs to be performed periodically (some users report every few days): ## Solution Steps: 1. Disable all inputs in this order: a. Call Record input b. User Report input (if configured) c. Subscription input d. Webhook input 2. Clean the KVStore to reset the checkpointer: splunk clean kvstore -app TA_MS_Teams -collection TA_MS_Teams_checkpointer Note: You need to run this command on the machine where the add-on is installed (usually a heavy forwarder). 3. Re-enable the inputs in this specific order: a. Webhook input b. Subscription input (this will recreate the subscription) c. Call Record input d. User Report input (if used) 4. Additional steps for persistent solution: a. If you're comfortable with scripting, you can create a scheduled task (cron job) to run these steps nightly b. For a more advanced solution, you could create an alert that triggers when "404 Not Found" errors appear in logs ## Scripted Solution Example: Here's a bash script that you could schedule to run nightly: bash #!/bin/bash # Path to Splunk binary SPLUNK_BIN="/opt/splunk/bin/splunk" # Disable inputs $SPLUNK_BIN disable input TA_MS_Teams://call_record $SPLUNK_BIN disable input TA_MS_Teams://user_report $SPLUNK_BIN disable input TA_MS_Teams://subscription $SPLUNK_BIN disable input TA_MS_Teams://webhook # Wait for processes to stop sleep 10 # Clean KVStore $SPLUNK_BIN clean kvstore -app TA_MS_Teams -collection TA_MS_Teams_checkpointer # Re-enable inputs in correct order $SPLUNK_BIN enable input TA_MS_Teams://webhook sleep 5 $SPLUNK_BIN enable input TA_MS_Teams://subscription sleep 10 $SPLUNK_BIN enable input TA_MS_Teams://call_record $SPLUNK_BIN enable input TA_MS_Teams://user_report echo "Microsoft Teams Add-on inputs reset completed at $(date)" Note that while this is a functional workaround, the root issue is in the add-on's code not properly handling 404 errors. As mentioned by others in the thread, the add-on should ideally be updated to remove call IDs from the webhook directory when they return 404 errors. If you're experiencing this issue frequently, I recommend also opening a support case with Splunk to encourage the development team to address this in a future update of the add-on. Please give   for support   happly splunking .... 
Hi @pramod,   I've worked with the Proofpoint ET Splunk TA in Splunk Cloud, and there's a specific way to handle the authentication for this app. For configuring the Proofpoint ET Intelligence in ... See more...
Hi @pramod,   I've worked with the Proofpoint ET Splunk TA in Splunk Cloud, and there's a specific way to handle the authentication for this app. For configuring the Proofpoint ET Intelligence in Splunk Cloud, you need to understand that there's a difference between the "authorization code" and the "Oink code": 1. The API key is what you get from your ET Intelligence subscription. 2. The "authorization code" field in the TA configuration actually requires your ET Intelligence subscription key (sometimes also called "download key"), NOT the Oink code. This is a common confusion point. 3. If you're seeing "None" for the authorization code, it's likely because that field hasn't been properly populated in your account settings on the Proofpoint ET Intelligence portal. Here's how to properly configure it: 1. Log in to your ET Intelligence account at https://threatintel.proofpoint.com/ 2. Navigate to "Account Settings" (usually in the top-right profile menu) 3. Make sure both your API key and subscription key (download key) are available - if your subscription key shows "None", contact Proofpoint support to have it properly provisioned 4. In Splunk Cloud: a. Install the Proofpoint ET Splunk TA through the Splunk Cloud self-service app installation b. Open the app configuration c. Enter your API key in the "API Key" field d Enter your subscription key (download key) in the "Authorization Code" field (NOT the Oink code) e. Save the configuration 5. If you're still getting errors, check the following: a. Look at the _internal index for any API connection errors b. Verify your Splunk Cloud instance has proper outbound connectivity to the Proofpoint ET Intelligence API endpoints c. Confirm with Proofpoint that your subscription is active and properly configured If you're still having issues after trying these steps, you may need to: 1. Submit a support ticket with Proofpoint to verify your account credentials 2. Work with Splunk Cloud support to ensure the app is properly installed and has the right permissions Please give for support happly splunking ....
Hi @clumicao, Currently, Mission Control does not have the same capability as Enterprise Security to create and share saved filters across users. In Mission Control, filters are saved locally to y... See more...
Hi @clumicao, Currently, Mission Control does not have the same capability as Enterprise Security to create and share saved filters across users. In Mission Control, filters are saved locally to your browser and user profile, making them user-specific rather than shareable across a team. There are a few workarounds you can use: 1. Document your most useful filters in a shared team document so others can manually recreate them 2. Use Mission Control's Export/Import feature to share filter configurations: a. After creating a filter, click on the "Export" option in the filter panel b. This will download a JSON configuration file c. Share this file with team members d. Other users can import this file using the "Import" option in their filter panel Note that even with the import method, each user would need to import the filter individually, and any updates to the filter would require re-sharing and re-importing. This has been a requested feature, and I recommend submitting it through the Splunk Ideas portal if it would be valuable for your team. The Splunk Mission Control team is regularly enhancing the product based on user feedback. Please give   for support   happly splunking .... 
For optimizing the AppDynamics Events Service Cluster startup process with self-healing capabilities, I recommend implementing systemd service units with proper configurations. This approach handles ... See more...
For optimizing the AppDynamics Events Service Cluster startup process with self-healing capabilities, I recommend implementing systemd service units with proper configurations. This approach handles all your scenarios: graceful shutdown, unplanned downtime, accidental process termination, and OOM situations. Here's a comprehensive solution: 1. Create systemd service units for both Events Service and Elasticsearch: For Events Service (events-service.service): ``` [Unit] Description=AppDynamics Events Service After=network.target elasticsearch.service Requires=elasticsearch.service [Service] Type=forking User=appdynamics Group=appdynamics WorkingDirectory=/opt/appdynamics/events-service ExecStart=/opt/appdynamics/events-service/bin/events-service.sh start ExecStop=/opt/appdynamics/events-service/bin/events-service.sh stop PIDFile=/opt/appdynamics/events-service/pid.txt TimeoutStartSec=300 TimeoutStopSec=120 # Restart settings for self-healing Restart=always RestartSec=60 # OOM handling OOMScoreAdjust=-900 # Health check script ExecStartPost=/opt/appdynamics/scripts/events-service-health-check.sh [Install] WantedBy=multi-user.target ``` For Elasticsearch (elasticsearch.service): ``` [Unit] Description=Elasticsearch for AppDynamics After=network.target [Service] Type=forking User=appdynamics Group=appdynamics WorkingDirectory=/opt/appdynamics/events-service/elasticsearch ExecStart=/opt/appdynamics/events-service/elasticsearch/bin/elasticsearch -d -p pid ExecStop=/bin/kill -SIGTERM $MAINPID PIDFile=/opt/appdynamics/events-service/elasticsearch/pid # Restart settings for self-healing Restart=always RestartSec=60 # Resource limits for OOM prevention LimitNOFILE=65536 LimitNPROC=4096 LimitMEMLOCK=infinity LimitAS=infinity # OOM handling OOMScoreAdjust=-800 [Install] WantedBy=multi-user.target ``` 2. Create a health check script (events-service-health-check.sh): ```bash #!/bin/bash # Health check for Events Service EVENT_SERVICE_PORT=9080 MAX_RETRIES=3 RETRY_INTERVAL=20 check_events_service() { for i in $(seq 1 $MAX_RETRIES); do if curl -s http://localhost:$EVENT_SERVICE_PORT/healthcheck > /dev/null; then echo "Events Service is running properly." return 0 else echo "Attempt $i: Events Service health check failed. Waiting $RETRY_INTERVAL seconds..." sleep $RETRY_INTERVAL fi done echo "Events Service failed health check after $MAX_RETRIES attempts." return 1 } check_elasticsearch() { for i in $(seq 1 $MAX_RETRIES); do if curl -s http://localhost:9200/_cluster/health | grep -q '"status":"green"\|"status":"yellow"'; then echo "Elasticsearch is running properly." return 0 else echo "Attempt $i: Elasticsearch health check failed. Waiting $RETRY_INTERVAL seconds..." sleep $RETRY_INTERVAL fi done echo "Elasticsearch failed health check after $MAX_RETRIES attempts." return 1 } main() { # Wait for initial startup sleep 30 if ! check_elasticsearch; then echo "Restarting Elasticsearch due to failed health check..." systemctl restart elasticsearch.service fi if ! check_events_service; then echo "Restarting Events Service due to failed health check..." systemctl restart events-service.service fi } main ``` 3. Set up a watchdog script for OOM monitoring (run as a cron job every 5 minutes): ```bash #!/bin/bash LOG_FILE="/var/log/appdynamics/oom-watchdog.log" ES_HEAP_THRESHOLD=90 ES_SERVICE="elasticsearch.service" EVENTS_HEAP_THRESHOLD=90 EVENTS_SERVICE="events-service.service" log_message() { echo "[$(date '+%Y-%m-%d %H:%M:%S')] $1" >> $LOG_FILE } check_elasticsearch_memory() { ES_MEMORY_PCT=$(ps -C java -o cmd= | grep elasticsearch | grep -o "Xmx[0-9]*[mMgG]" | head -1) ES_CURRENT_HEAP=$(jstat -gc $(pgrep -f elasticsearch) 2>&1 | tail -n 1 | awk '{print ($3+$4+$5+$6+$7+$8)/1024}') if [[ $ES_MEMORY_PCT == *"g"* || $ES_MEMORY_PCT == *"G"* ]]; then ES_MAX_HEAP=${ES_MEMORY_PCT//[^0-9]/} ES_MAX_HEAP=$((ES_MAX_HEAP * 1024)) else ES_MAX_HEAP=${ES_MEMORY_PCT//[^0-9]/} fi ES_HEAP_PCT=$((ES_CURRENT_HEAP * 100 / ES_MAX_HEAP)) if [ $ES_HEAP_PCT -gt $ES_HEAP_THRESHOLD ]; then log_message "Elasticsearch heap usage at ${ES_HEAP_PCT}% - exceeds threshold of ${ES_HEAP_THRESHOLD}%. Restarting service." systemctl restart $ES_SERVICE return 0 fi return 1 } check_events_service_memory() { EVENTS_MEMORY_PCT=$(ps -C java -o cmd= | grep events-service | grep -o "Xmx[0-9]*[mMgG]" | head -1) EVENTS_CURRENT_HEAP=$(jstat -gc $(pgrep -f "events-service" | grep -v grep | head -1) 2>&1 | tail -n 1 | awk '{print ($3+$4+$5+$6+$7+$8)/1024}') if [[ $EVENTS_MEMORY_PCT == *"g"* || $EVENTS_MEMORY_PCT == *"G"* ]]; then EVENTS_MAX_HEAP=${EVENTS_MEMORY_PCT//[^0-9]/} EVENTS_MAX_HEAP=$((EVENTS_MAX_HEAP * 1024)) else EVENTS_MAX_HEAP=${EVENTS_MEMORY_PCT//[^0-9]/} fi EVENTS_HEAP_PCT=$((EVENTS_CURRENT_HEAP * 100 / EVENTS_MAX_HEAP)) if [ $EVENTS_HEAP_PCT -gt $EVENTS_HEAP_THRESHOLD ]; then log_message "Events Service heap usage at ${EVENTS_HEAP_PCT}% - exceeds threshold of ${EVENTS_HEAP_THRESHOLD}%. Restarting service." systemctl restart $EVENTS_SERVICE return 0 fi return 1 } # Check if services are running if ! systemctl is-active --quiet $ES_SERVICE; then log_message "Elasticsearch service is not running. Attempting to start." systemctl start $ES_SERVICE fi if ! systemctl is-active --quiet $EVENTS_SERVICE; then log_message "Events Service is not running. Attempting to start." systemctl start $EVENTS_SERVICE fi # Check memory usage check_elasticsearch_memory check_events_service_memory ``` 4. Enable and start the services: ```bash # Make scripts executable chmod +x /opt/appdynamics/scripts/events-service-health-check.sh chmod +x /opt/appdynamics/scripts/oom-watchdog.sh # Place service files cp elasticsearch.service /etc/systemd/system/ cp events-service.service /etc/systemd/system/ # Reload systemd, enable and start services systemctl daemon-reload systemctl enable elasticsearch.service systemctl enable events-service.service systemctl start elasticsearch.service systemctl start events-service.service # Add the OOM watchdog to cron (crontab -l 2>/dev/null; echo "*/5 * * * * /opt/appdynamics/scripts/oom-watchdog.sh") | crontab - ``` This setup addresses all your scenarios: 1. Graceful Shutdown: The systemd ExecStop commands ensure proper shutdown sequences 2. Unplanned Downtime: The Restart=always ensures services restart after VM reboots 3. Accidental Process Kill: Again, Restart=always handles this automatically 4. OOM Situations: Combination of OOMScoreAdjust, resource limits, and the custom watchdog script You may need to adjust paths and user/group settings to match your environment. This implementation provides comprehensive self-healing for AppDynamics Events Service and Elasticsearch.
While rebuilding forwarder database might sometimes help if it becomes corrupted or contains too many orphaned entries, the question worth looking into is how your UFs are deployed and configured. Ar... See more...
While rebuilding forwarder database might sometimes help if it becomes corrupted or contains too many orphaned entries, the question worth looking into is how your UFs are deployed and configured. Are you sure they aren't sharing the GUID and hostname?
Hi guys, I got the same issue. The proposed solution to clear the kvstore helped me once, I gave Karma for them, but this was not an acceptable solution for me. I wrote a little patch, based on ve... See more...
Hi guys, I got the same issue. The proposed solution to clear the kvstore helped me once, I gave Karma for them, but this was not an acceptable solution for me. I wrote a little patch, based on version 2.0.0 of the app, to solve the issue. I just posted it here. I hope this can help you too. B.
Wow @ephemeric thanks for the help!! -.-
One additional comment. As told in previous post splunk support that loginType=splunk works and there is always at least admin account which cannot be anything else than local. BUT if there are any... See more...
One additional comment. As told in previous post splunk support that loginType=splunk works and there is always at least admin account which cannot be anything else than local. BUT if there are any WAF before your Splunk service login then there could be some WAF rules which denies to add this additional loginType parameter into URL. If this happens then you need to discuss with those security staff that they will allow that additional parameter e.g. in some specific addresses.
Hi, I got the same issue. I wrote a small patch for the teams_subscription.py binary to solve it. It is based on release 2.0.0. The patch is attached as TA_MS_Teams-bruno.patch.txt. To use it, j... See more...
Hi, I got the same issue. I wrote a small patch for the teams_subscription.py binary to solve it. It is based on release 2.0.0. The patch is attached as TA_MS_Teams-bruno.patch.txt. To use it, just save the file as TA_MS_Teams-bruno.patch in the $SPLUNK_HOME/etc/apps directory and apply it using the following command in the TA_MS_Teams directory: pelai@xps MINGW64 /d/src/TA_MS_Teams $ patch -p1 < ../TA_MS_Teams-bruno.patch.txt patching file bin/teams_subscription.py pelai@xps MINGW64 /d/src/TA_MS_Teams $  It is possible to revert the patch at anytime just using patch with the -R parameter. I hope this can help. B.
Hi @kiran_panchavat  $10/GB? Where are you seeing that?  That would be nice.  (Edit - I see you've now removed this and replaced it with other content) Google suggests "Splunk typically costs betwe... See more...
Hi @kiran_panchavat  $10/GB? Where are you seeing that?  That would be nice.  (Edit - I see you've now removed this and replaced it with other content) Google suggests "Splunk typically costs between $1,800-$2,500 per GB/day for an annual license." but this is probably based on public pricing resources without any partner/volume discounts etc.  For what its worth, I agree with @isoutamo that Splunk Cloud would be a good option here, I thought smallest was 50GB but if its only 5GB then the annual cost for this is probably less than the cost of someone building, running and maintaining a cluster on-prem!  For what its worth - I did a conf talk in 2020 about moving to Cloud and the "cost" ("The effort, loss, or sacrifice necessary to achieve or obtain something") - TL;DR; Cloud was cheaper. https://conf.splunk.com/files/2020/slides/PLA1180C.pdf#page=37 The Free version wont suffice for the PoC because it doesnt have the features required such as clustering, authentication etc etc. Get in with Sales and they can help arrange a PoC license if they think you'll be looking to purchase a full license  https://www.splunk.com/en_us/talk-to-sales/pricing.html?expertCode=sales  Did this answer help you? If so, please consider: Adding karma to show it was useful Marking it as the solution if it resolved your issue Commenting if you need any clarification Your feedback encourages the volunteers in this community to continue contributing    
I haven't tested it, but believe what you other have found. But as currently Splunk has some GUI tools like IA (ingest action) etc. there is a small possibility that this is a planned way to work? A... See more...
I haven't tested it, but believe what you other have found. But as currently Splunk has some GUI tools like IA (ingest action) etc. there is a small possibility that this is a planned way to work? As @livehybrid said there are best practices how those The 8 should be set based on those other values. You probably will create a support case for this and then we will get official answer is this bug or planned feature?
After 9.2 the reason for this is that DS needs local indexes for get information about DCs.  And best practices said that you should forward all logs into your indexer instead of keeping those into l... See more...
After 9.2 the reason for this is that DS needs local indexes for get information about DCs.  And best practices said that you should forward all logs into your indexer instead of keeping those into local node like DS. You could fix this by adding to stored those indexes also into locally.  https://docs.splunk.com/Documentation/Splunk/9.2.0/Updating/Upgradepre-9.2deploymentservers here is instructions how to do it.
The rebuild of forwarders assets should happen automatically with period what is defined into CMC -> Forwarders -> Forwarder Monitoring Setup: Data Collection Interval. If that time has gone after yo... See more...
The rebuild of forwarders assets should happen automatically with period what is defined into CMC -> Forwarders -> Forwarder Monitoring Setup: Data Collection Interval. If that time has gone after you have add this UF and you can see those logs in _internal index and this continue I propose that you create a support ticket that they could figure out why this forwarder asset hasn't updated as expected. Of course if time has elapsed less than that period you could update it manually or decrease that time and build it again. What is preferred time period for update is depending on your needs to get those UF into this list and how many UFs you have.
You could also use tstats with prestats=t paramter like | tstats prestats=t count where index=my_index by _time span=1h | timechart span=1h count | timewrap 1w  https://docs.splunk.com/Documentatio... See more...
You could also use tstats with prestats=t paramter like | tstats prestats=t count where index=my_index by _time span=1h | timechart span=1h count | timewrap 1w  https://docs.splunk.com/Documentation/Splunk/9.4.2/SearchReference/Tstats