All Topics

Top

All Topics

We have a deployment server which deploys apps (which contains configs) to Search head cluster (3 SH). I am not sure whether DS distributes apps directly to SH members or it will sent to deployer and... See more...
We have a deployment server which deploys apps (which contains configs) to Search head cluster (3 SH). I am not sure whether DS distributes apps directly to SH members or it will sent to deployer and from there deployer will distribute apps to SH members? Please clarify. We have created a role in DS app which restricts to specific index. When we try to push it... That role is not reflecting in SH members? But when we are checking in Deployer that app is present under shcluster/apps and that role is updated. But it is not showing in SH UI. What is the problem?   Do we need to manually push the config from deployer to SH members everytime? We have config in Deployer as deployer_push_mode=merge_to_default... Is it means distribution is automated? If not how to push config from Deployer to SH members through Splunk web? We don't have access to backend server to give CLI command.
I've got to be close. But having issues trying to figure out how to get a distinct count of user sessions to show up in a bar chart with a trendline. I'd like to see a distinct count of users for las... See more...
I've got to be close. But having issues trying to figure out how to get a distinct count of user sessions to show up in a bar chart with a trendline. I'd like to see a distinct count of users for last year by month and have a trendline added. <My Search> | stats dc(userSesnId) as moving_avg | timechart span=30d dc(userSesnId) as count_of_user_sessions | trendline sma4(moving_avg) as "Moving Average" | rename count_of_user_sessions AS "Disctinct Count of User Sessions"
  Hello Team, Currently, in our application forms, clicking on a link opens it in the same tab. This behavior causes the form to reset, requiring users to refill all the previously entered deta... See more...
  Hello Team, Currently, in our application forms, clicking on a link opens it in the same tab. This behavior causes the form to reset, requiring users to refill all the previously entered details. While we have advised users to manually open these links in a new tab as a workaround, we are looking for a more seamless solution to address this issue. We have explored available options to enforce links to open in a new tab by default but have not been successful in implementing it. We would greatly appreciate your guidance on whether this functionality can be implemented and, if so, the steps required to achieve it.
Working on a dashboard in dashboard studio to display data in two different tables using a single dropdown.  Issue I have is that all my data is determined by the "username" field but want to have dr... See more...
Working on a dashboard in dashboard studio to display data in two different tables using a single dropdown.  Issue I have is that all my data is determined by the "username" field but want to have dropdown display user Lastname, Firstname for better visibility.    First table pulls records from a lookup table with user demographics and such.  Second table is pulling respective window log data tracking various user activity.   In my dropdown, I am currently using the lookup table and eval function to join both "user_last", "user_first" set variable to "fullname" and display User "Lastname, Firstname".   I then used "fullname" as the pass-on token for my first table.   However, my second table, I need the "username" as the token because the data I am querying only has the "username" in the logs and not the users first or last name as my first table.    My question is can I set my dropdown to display "user_last, user_first" names but set the token value as "username" or can I assign multiple tokens in a SPL query in Dashboard Studio to use in the respective tables or can I do both for sake of knowledge.   Here is what I am working with and appreciate any assistance with this. Lookup table:      Name:    system_users.csv      Fields:    username,    name_last,     name_first.... Dashboard Dropdown Field Values:     Data Source Name:    lookup_users SPL Query:     | inputlookup bpn_system_users.csv | eval fullname= name_last.", ".name_first | table fullname | sort fullname Source Code:    { "type": "ds.search", "options": { "queryParameters": { "earliest": "$SearchTimeLine.earliest$", "latest": "$SearchTimeLine.latest$" }, "query": " | inputlookup system_users.csv\n | eval fullname= name_last.\", \".name_first\n | table fullname\n | sort fullname" }, "name": "lookup_users" }
Hello Everyone,  I am hoping someone can help me out as I have exhausted everything I can think of and cannot seem to get anything to work. Essentially what I am looking to do is pull results to get... See more...
Hello Everyone,  I am hoping someone can help me out as I have exhausted everything I can think of and cannot seem to get anything to work. Essentially what I am looking to do is pull results to get a total based off an ID. The issue I am running into is that the ID will have between 1-4 events associated to it. These events are related to the status.  I am only wanting to get the results for any ID that are Open and Escalated but the issue I am running into is that it is pulling all of the events even of those that have since had the status changed to closed or another status.  I am wanting to excluded all of the events for IDs that have had their status changed to anything other than Open or Escalated. The other trouble that I am running into is that this "status" event is occuring in the metadata of the whole transaction. I have the majority of my query built out but where I am struggling is removing the initial Open and Escalated events for the alerts that the status was changed. The field the status changes in is under "logs" and then "logs{}.action".  
Hello Splunkers, I need some help to understand what will be the minimum spects required for Splunk Enterprise Installation for the purpose Heavy Forwarder where only it will receive logs from 1 sou... See more...
Hello Splunkers, I need some help to understand what will be the minimum spects required for Splunk Enterprise Installation for the purpose Heavy Forwarder where only it will receive logs from 1 source over Syslog and forward to Indexers.  Can I just use 2 CPU's 8 GB RAM and storage based of estimation of the log file sizes. I'm asking this because the official guide says it should be minimum 12 GB RAM , 4 Cores CPU. Please if someone can advise on this. Thanking you in advance,   Moh....
I am using StatsD to send metrics to a receiver, but I am encountering an issue where timing metrics (|ms) are not being captured, even though counter metrics (|c) work fine on Splunk Observability... See more...
I am using StatsD to send metrics to a receiver, but I am encountering an issue where timing metrics (|ms) are not being captured, even though counter metrics (|c) work fine on Splunk Observability Cloud.   Example of Working Metric: The following command works and is processed correctly by the StatsD receiver:   echo "test_Latency:42|c|#key:val" | nc -u -w1 localhost 8127   Example of Non-Working Metric: However, this command does not result in any output or processing:   echo "test_Latency:0.082231|ms" | nc -u -w1 localhost 8127   Current StatsD Configuration: Here is the configuration I am using for the receiver by following the doc: https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/receiver/statsdreceiver   receivers: statsd: endpoint: "localhost:8127" aggregation_interval: 30s enable_metric_type: true is_monotonic_counter: false timer_histogram_mapping: - statsd_type: "histogram" observer_type: "gauge" - statsd_type: "timing" observer_type: "histogram" histogram: max_size: 100 - statsd_type: "distribution" observer_type: "summary" summary: percentiles: [0, 10, 50, 90, 95, 100]    Why are timing metrics (|ms) not being captured while counters (|c) are working, can you please help to check on it as the statsdreceiver github document says it supports "timer" related metrics https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/receiver/statsdreceiver/README.md#timer Any help or suggestions would be greatly appreciated. Thank You.
After upgrading Splunk to 9.4.0 and Splunk DB Connect to 3.18.1, all INPUTS have the error: Checkpoint not found. The input in rising mode is expected to contain a checkpoint.   None of them are p... See more...
After upgrading Splunk to 9.4.0 and Splunk DB Connect to 3.18.1, all INPUTS have the error: Checkpoint not found. The input in rising mode is expected to contain a checkpoint.   None of them are pulling in data. Looking over the logs, I see:   2025-01-10 12:16:00.298 +0000 Trace-Id=1d3654ac-86c1-445f-97c6-6919b3f6eb8c [Scheduled-Job-Executor-116] ERROR org.easybatch.core.job.BatchJob - Unable to open record reader com.splunk.dbx.server.exception.ReadCheckpointFailException: Error(s) occur when reading checkpoint. at com.splunk.dbx.server.dbinput.task.DbInputCheckpointManager.load(DbInputCheckpointManager.java:71) at com.splunk.dbx.server.dbinput.task.DbInputTask.loadCheckpoint(DbInputTask.java:133) at com.splunk.dbx.server.dbinput.recordreader.DbInputRecordReader.executeQuery(DbInputRecordReader.java:82) at com.splunk.dbx.server.dbinput.recordreader.DbInputRecordReader.open(DbInputRecordReader.java:55) at org.easybatch.core.job.BatchJob.openReader(BatchJob.java:140) at org.easybatch.core.job.BatchJob.call(BatchJob.java:97) at com.splunk.dbx.server.api.service.conf.impl.InputServiceImpl.runTask(InputServiceImpl.java:321) at com.splunk.dbx.server.api.resource.InputResource.lambda$runInput$1(InputResource.java:183) at com.splunk.dbx.logging.MdcTaskDecorator.run(MdcTaskDecorator.java:23) at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539) at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) at java.base/java.lang.Thread.run(Thread.java:833)   I'm unable to Edit the config, and update the Check point value. Even thought the Execute Query works, when I try to save the update it gives: Error(s) occur when reading checkpoint.   Has anybody else successfully upgraded to 9.4.0 and 3.18.1?
Trying to check and set values conditionally but below query is giving error Error :- Error in 'eval' command: Fields cannot be assigned a boolean result. Instead, try if([bool expr], [expr], [e... See more...
Trying to check and set values conditionally but below query is giving error Error :- Error in 'eval' command: Fields cannot be assigned a boolean result. Instead, try if([bool expr], [expr], [expr]). The search job has failed due to an error. You may be able view the job in the    Query :- index="uhcportals-prod-logs" sourcetype=kubernetes container_name="myuhc-sso" logger="com.uhg.myuhc.log.SplunkLog" message.ssoType="Inbound" | eval ssoType = if(message.incomingRequest.inboundSsoType == "5-KEY", message.incomingRequest.deepLink, message.incomingRequest.inboundSsoType== "HYBRID", message.incomingRequest.inboundSsoType) | stats distinct_count("message.ssoAttributes.EEID") as Count by ssoType, "message.backendCalls{}.responseCode"
01-09-2025 17:01:37.725 -0500 WARN  TcpOutputProc [4940 parsing] - The TCP output processor has paused the data flow. Forwarding to host_dest=sbdcrib.splunkcloud.com inside output group default-autol... See more...
01-09-2025 17:01:37.725 -0500 WARN  TcpOutputProc [4940 parsing] - The TCP output processor has paused the data flow. Forwarding to host_dest=sbdcrib.splunkcloud.com inside output group default-autolb-group from host_src=CRBCITDHCP-01 has been blocked for blocked_seconds=1800. This can stall the data flow towards indexing and other network outputs. Review the receiving system's health in the Splunk Monitoring Console. It is probably not accepting data.
01-09-2025 17:30:30.169 -0500 INFO  PeriodicHealthReporter - feature="TCPOutAutoLB-0" color=red indicator="s2s_connections" due_to_threshold_value=70 measured_value=100 reason="More than 70% of forwa... See more...
01-09-2025 17:30:30.169 -0500 INFO  PeriodicHealthReporter - feature="TCPOutAutoLB-0" color=red indicator="s2s_connections" due_to_threshold_value=70 measured_value=100 reason="More than 70% of forwarding destinations have failed.  Ensure your hosts and ports in outputs.conf are correct.  Also ensure that the indexers are all running, and that any SSL certificates being used for forwarding are correct." node_type=indicator node_path=splunkd.data_forwarding.splunk-2-splunk_forwarding.tcpoutautolb-0.s2s_connections
What are some reasons why a Linux UF will get quarantined by the deployment manager:8089? 
I have two log messages "%ROUTING-LDP-5-NSR_SYNC_START" and "%ROUTING-LDP-5-NBR_CHANGE" which usually accompany each other whenever there is a peer flapping. So "%ROUTING-LDP-5-NBR_CHANGE" is followe... See more...
I have two log messages "%ROUTING-LDP-5-NSR_SYNC_START" and "%ROUTING-LDP-5-NBR_CHANGE" which usually accompany each other whenever there is a peer flapping. So "%ROUTING-LDP-5-NBR_CHANGE" is followed by "%ROUTING-LDP-5-NSR_SYNC_START" almost every time. I am trying to find the output where a device only produces "%ROUTING-LDP-5-NSR_SYNC_START" without "%ROUTING-LDP-5-NBR_CHANGE" and I am using transaction but not been able to figure it out.  index = test ("%ROUTING-LDP-5-NSR_SYNC_START" OR "%ROUTING-LDP-5-NBR_CHANGE") | transaction maxspan=5m startswith="%ROUTING-LDP-5-NSR_SYNC_START" endswith="%ROUTING-LDP-5-NBR_CHANGE" | search eventcount=1 startswith="%ROUTING-LDP-5-NSR_SYNC_START" | stats count by host
Hey, so my company is working on creating a visual in SharePoint by integrating Iframes from report. This is working fine but the question I had was that, Will the embedded link stop working if a use... See more...
Hey, so my company is working on creating a visual in SharePoint by integrating Iframes from report. This is working fine but the question I had was that, Will the embedded link stop working if a user that created them leaves the org and the account is disabled or deleted? Thank you in advance for any help! #Iframes #reports #embed
I am trying to query AWS config data in Splunk to identify the names of all S3 buckets in AWS. Is there a way to write a SPL that will list out the S3 bucket names from t
This is a new version of the licensing model that consumes licenses based on vCPUs. Available for both On-Premes and SaaS. Utilization is 1 License unit per CPU Core. It does not matter how man... See more...
This is a new version of the licensing model that consumes licenses based on vCPUs. Available for both On-Premes and SaaS. Utilization is 1 License unit per CPU Core. It does not matter how many agents are running on a server, how many applications/containers these agents are monitoring, how much data these agents are collecting/reporting, and how many transactions these agents are creating. The Licenses will be consumed based on the number of CPUs available on the Server/Host. The Basics What are the minimum versions required for controller and apm/database/server agents to properly count vCPU? These are the required AppD agent versions needed to make the customer fully IBL compliant. Controller: v21.2+ (for the database agent to default to 4vCPU instead of 12vCPU, v23.8+(csaas)/v23.7+(on-prem)) Machine Agent: 20.12+ .NET Agent: 20.12+ DB Agent: Min: 21.2.0 (recommended latest 21.4.0) (for MySQL/PostgreSQL RDS databases IBL support, the minimum is 22.6.0) For accurate license counting, the machine agent needs to be deployed, or hardware monitoring needs to be enabled in case of database monitoring. The machine agent version should be greater than 20.12. The machine agent will calculate the number of CPUs available on the monitoring Server/Host. How to migrate from Agent Based Licensing to Infrastructure Based Licensing? Migration from Agent Based Licensing to Infrastructure Based Licensing is handled by licensing-help@appdynamics.com On conversion: All license rules are maxed out to account value by default keeping app/server scope restrictions as is For example: LicenseRuleA to Z with 2 apm units each, and accountLevelApm=100 units on conversion will be set to LicenseRuleA-Z 400units "each" and accountLevelHBL=400 units on conversion. 400 is just a random number here, final conversion is made by the sales team What is the definition of vCPU and how do I verify if it's correct? In the case of a Physical Machine, the number of logical cores or processors is considered to be the vCPU count For planning purposes, you can use the following table to find out the CPU core in case the Machine agent is not available/running Technology Logical CPU Core Where is it captured? Bare metal servers Logical CPU Cores = # of processors Windows: - Task Manager - PowerShell Linux: Linux Virtual Machines Logical CPU Cores (accounting for hyperthreading) Cloud Providers Logical CPU Cores = vCPU AWS: EC2 AWS Instances Azure: Azure VMs GCP: Standard Machine Types Windows: Task Manager **insert image System Information ***insert image wmic ***insert image Linux nproc or lscpu **insert image Mac OS: sysctl -a | grep machdep.cpu.*_count OR sysctl -n hw.logicalcpu ** insert image What are packages? Each agent that consumes a license will be part of a single package. Packages will be provisioned at the account level and distributed within license rules (limited packages are supported by license rules). Packages fall under – ENTERPRISE, PREMIUM, INFRASTRUCTURE Package   What it offers? Agent list (as seen on the Connected Agents page) SAP Enterprise (SAP_ENTERPRISE) Monitor all your SAP Servers, network, and SAP Apps and get Business Insights on them using AppDynamics agents. APM Any Language: agent-type=sap-agent Network Visibility: agent-type=netviz + Everything under AppDynamics Infrastructure Monitoring Enterprise (ENTERPRISE) Monitor all your Servers, network, databases, and Apps and get Business Insights on them using AppDynamics agents Transaction Analytics: agent-type=transaction-analytics + Everything under AppDynamics Premium Premium (PREMIUM) Monitor all your Servers, network, databases, and Apps using AppDynamics agents. APM Any Language: agent-type=apm, java, dot-net, native-sdk, nodejs, php , python, golang-sdk, wmb-agent, native-web-server Network Visibility: agent-type=netviz Database Visibility: agent-type=db_agent, db_collector + Everything under AppDynamics Infrastructure Monitoring AppDynamics Infrastructure Monitoring (INFRA) Monitor all your Servers using AppDynamics agents. Server Visibility: agent-type=sim-machine-agent Machine Agent: agent-type=machine-agent Cluster Agent: agent-type=cluster-agent NET Machine Agent: agent-type=dot-net-machine-agent   For other packages please check https://docs.appdynamics.com/appd/23.x/latest/en/appdynamics-licensing/license-entitlements-and-restrictions Do I need individual packages for my account? Only transaction analytics needs ENTERPRISE as a mandate. If you do not have the ENTERPRISE package, the transaction analytics agent cannot report even if the licenses are available at the PREMIUM package All INFRA agents can be reported against PREMIUM or ENTERPRISE packages. All PREMIUM agents can be reported against the ENTERPRISE package What happens when my package level consumption is full? (Redirection Valid only if account level limits are not maxed out: If agents report against the INFRA package and the INFRA license pool is full but premium is free, new license consumption will be re-directed to PREMIUM. If agents report against PREMIUM package and PREMIUM license pool is full but ENTERPRISE is free, new license consumption will be re-directed to ENTERPRISE. For transaction analytics agents, if ENTERPRISE is full, the controller cannot switch back to unconsumed PREMIUM or unconsumed INFRA. This swapping takes place at the license rule level. Valid only if account level limits are maxed out: The above redirection will not take place if account level limits are maxed out even if a few license rule units are unconsumed. Can I force agents to report against a package? Yes, you can manage restrictions via license rules -> Server Scope / Application Scope with license rules. You will have to provide only ENTERPRISE, or ONLY PREMIUM within a single license rule. Otherwise, default re-direction is respected. 1 agent can report against 1 package only. Which packages are supported under license rules? As of 23.12.x controller version, only Premium, Enterprise, Enterprise SAP, and Infrastructure Monitoring packages are supported under License rules. The consumption and re-direction work the same as account-level switching. What happens if I do not have a machine agent to report vCPUs or I do not have hardware monitoring enabled? In case the machine agent is not running or deleted from the server or the agents are unable to find the number of CPUs, the license unit will be calculated based on the Fallback Mechanism. APM Agent - 4 CPU ~ 4 License Unit Default DB Collector – 4 OR 12 CPU ~4 OR 12 License Unit Default Why is my vCPU reported incorrectly? Inaccurate vcpu does not mean AppD is consuming the wrong licenses. It means users are not providing AppDynamics with ways to calculate licenses properly. Most common reasons: Machine Agent not installed- a. Any agent goes into fallback mode if there is no machine agent. b. database agent goes into fallback mode = 4 OR 12 default vpus even if host has1 vCPU c. APM Agent goes into fallback mode = 4 default vpus even if host has1 vCPU Managed Database (aws gcp azure) + machine agent mismatch a. Machine agent cannot be installed on managed/cloud database services b. Hardware metrics should be enabled which reports the vCPU count. If it is not enabled, default 4 OR 12 licenses are being consumed as fallback UniqueHostIdMisMatch- a. There is a mismatch between host mapping and thus each uniquehostid will be considered a different host even if they reside on the same physical machine b. Consider example - 2 java agents + 1 machine-agent on the same machine. With mismatch, 3 agents will show up as individual rows in the host-table and each would consume 4 vCPU license = 4*3 = 12vCPU towards the total License consumption, but the expectation is total as 4 and not 12. Can the two licensing models (agent based and host based) co-exist on the same license? No. A given license can only be on one of the two models. If Infrastructure Based Licensing (IBL) is enabled for a customer, can it be reverted to the legacy Agent Based Licensing (ABL) model later? No, we cannot revert. Can you share a couple of scenarios? A 4vCPU host is running with- 1 sim-agent 3 app agents and 1 netviz agent ~ Final consumption is total 4 vCPU license. A 4vCPU host is running with- 1 machine-agent 3 app agents and 1 netviz agent ~ Final consumption is total 4 vCPU license. If the machine is not updated/not reporting after the initial vcpus were reported We have timeout after which apm agents will default to fall back mechanism. If the machine adds on to the vCPUs(vertical scale) keeping the hostnames same etc. On scaling up or down, within 10min accurate vcpus would be reported into controller. A temporary spike / dip in license usage is expected if agent restarts within 5min. Database agent scenarios: if there is DB agents or DB and machine agents on a host (identified by unique host Id) then license units used ("vCPU") will be capped at 4 (4 or less, if MA reports less vCPUs for example) if there is any other agent type than DB / MA (e.g. app agent) the capping is not happening and license units used are calculated as usual in case of fallback it's 4 LUs per all DBs on the host + 4 LUs per any other agent reporting in non-fallback case (licensing knows vCPUs) the reported vCPU count is used (if both DB and MA reports vCPUs, licensing trusts MA more). Example: 2vcpu db + 3 vcpu db = 3. similarly 2vcpu db + 8vcpu machine agent = 4 max 5 vcpu db + ma = 4 max. 100vcpu db + ma = 4max 2vcpu db only = 4 100vcpu db = 4.
Hey guys, so I was wondering if anyone had any idea how to optimize this query to minimize the sub searches.  My brain hurts just looking at it honestly, for all the SPL Pros please lend a hand if ... See more...
Hey guys, so I was wondering if anyone had any idea how to optimize this query to minimize the sub searches.  My brain hurts just looking at it honestly, for all the SPL Pros please lend a hand if possible.    index=efg* * | search EVENT_TYPE=FG_EVENTATTR AND ((NAME=ConsumerName AND VALUE=OneStream) OR NAME=ProducerFilename OR NAME=OneStreamSubmissionID OR NAME=ConsumerFileSize OR NAME=RouteID) | search | where trim(VALUE)!="" | eval keyValuePair=mvzip(NAME,VALUE,"=") | eval efgTime=min(MODIFYTS) ```We need to convert EDT/EST timestamps to UTC time.``` | eval EST_time=strptime(efgTime,"%Y-%m-%d %H:%M:%S.%N") ```IMPORTANT STEP: During EDT you add 14400 to convert to UTC; during EST you add 18000. (We need to automate this step in the code.)``` | eval tempTime = EST_time | eval UTC_time=strftime(tempTime, "%Y-%m-%d %H:%M:%S.%1N") | stats values(*) as * by ARRIVEDFILE_KEY | eval temptime3=min(UTC_time) | eval keyValuePair=mvappend("EFG_Delivery_Time=".temptime3, keyValuePair) | eval keyValuePair=mvsort(keyValuePair) ```Let's extract our values now.``` | eval tempStr_1 = mvfilter(LIKE(keyValuePair, "%ConsumerFileSize=%")) | eval tempStr_2 = mvfilter(LIKE(keyValuePair, "%EFG_Delivery_Time=%")) | eval tempStr_3 = mvfilter(LIKE(keyValuePair, "%OneStreamSubmissionID=%")) | eval tempStr_4 = mvfilter(LIKE(keyValuePair, "%ProducerFilename=%")) | eval tempStr_5 = mvfilter(LIKE(keyValuePair, "%RouteID=%")) ```Now, let's assign the values to the right field name.``` | eval "File Size"=ltrim(tempStr_1,"ConsumerFileSize=") | eval "EFG Delivery Time"=ltrim(tempStr_2,"EFG_Delivery_Time=") | eval "Submission ID"=substr(tempStr_3, -38) | eval "Source File Name"=ltrim(tempStr_4,"ProducerFilename=") | eval "Route ID"=ltrim(tempStr_5,"RouteID=") ```Bring it all together! (Join EFG data to the data in the OS lookup table.``` | search keyValuePair="*OneStreamSubmissionID*" | rename "Submission ID" as Submission_ID | rename "Source File Name" as Source_File_Name | join type=left max=0 Source_File_Name [ search index=asvsdp* source=Watcher_Delivery_Status sourcetype=c1_json event_code=SINK_DELIVERY_COMPLETION (sink_name=onelake-delta-table-sink OR sink_name=onelake-table-sink OR onelake-direct-sink) | eval test0=session_id | eval test1=substr(test0, 6) | eval o=len(test1) | eval Quick_Check=substr(test1, o-33, o) | eval p=if(like(Quick_Check, "%-%"), 35, 33) | eval File_Name_From_Session_ID=substr(test1, 1, o-p) | rename File_Name_From_Session_ID as Source_File_Name ```| lookup DFS-EFG-SDP-lookup_table_03.csv local=true Source_File_Name AS Source_File_Name OUTPUT Submission_ID, OS_time, BAP, Status``` | join type=left max=0 Source_File_Name [ search index=asvexternalfilegateway_summary * | table Source_File_Name, Submission_ID, Processed_time, OS_time, BAP, Status ] | table event_code, event_timestamp, session_id, sink_name, _time, Source_File_Name, Submission_ID, OS_time, BAP, Status | search "Source_File_Name" IN (*OS.AIS.COF.DataOne.PROD*, *fgmulti_985440_GHR.COF.PROD.USPS.CARD*, *COF-DFS*) ] ```| lookup DFS-EFG-SDP-lookup_table_03.csv Submission_ID AS Submission_ID OUTPUT Processed_time, OS_time, BAP, Status``` | join type=left max=0 Submission_ID [ search index=asvexternalfilegateway_summary * | table Submission_ID, Processed_time, OS_time, BAP, Status ] | eval "Delivery Status"=if(event_code="SINK_DELIVERY_COMPLETION","DELIVERED","FAILED") | eval BAP = upper(BAP) ```| rename Processed_time as "OL Delivery Time" | eval "OL Delivery Time"=if('Delivery Status'="FAILED","Failed at OneStream",'OL Delivery Time')``` | rename OS_time as "OS Delivery Time" ```Display consolidated data in tabular format.``` | eval "OL Delivery Time"=strftime(event_timestamp/1000, "%Y-%m-%d %H:%M:%S.%3N") ``` Convert OS timestamp from UTC EST/EDT ``` | eval OS_TC='OS Delivery Time' | eval OS_UTC_time=strptime(OS_TC,"%Y-%m-%d %H:%M:%S.%3N") ```IMPORTANT STEP: During EDT you add 14400 to convert to UTC; during EST you add 18000. (We need to automate this step in the code.)``` | eval tempTime_2 = OS_UTC_time - 18000 ```| eval tempTime = EST_time``` | eval "OS Delivery Time"=strftime(tempTime_2, "%Y-%m-%d %H:%M:%S.%3N") ``` Convert OL timestamp from UTC EST/EDT ``` | eval OL_UTC_time=strptime('OL Delivery Time',"%Y-%m-%d %H:%M:%S.%3N") ```IMPORTANT STEP: During EDT you add 14400 to convert to UTC; during EST you add 18000. (We need to automate this step in the code.)``` | eval tempTime_3 = OL_UTC_time - 18000 ```| eval tempTime = EST_time``` | eval "OL Delivery Time"=strftime(tempTime_3, "%Y-%m-%d %H:%M:%S.%3N") | rename Source_File_Name as "Source File Name" | rename Submission_ID as "Submission ID" | fields BAP "Route ID" "Source File Name" "File Size" "EFG Delivery Time" "OS Delivery Time" "OL Delivery Time" "Delivery Status" "Submission ID" ``` | search Source_File_Name IN (*COF-DFS*)``` | append [ search index=efg* source=efg_prod_summary sourcetype=stash STATUS_MESSAGE=Failed ConsumerName=OneStream | eval BAP=upper("badiscoverdatasupport") | eval "Delivery Status"="FAILED", "Submission ID"="--" | rename RouteID as "Route ID", SourceFilename as "Source File Name", FILE_SIZE as "File Size", ArrivalTime as "EFG Delivery Time" | table BAP "Route ID" "Source File Name" "File Size" "EFG Delivery Time" "OS Delivery Time" "OL Delivery Time" "Delivery Status" "Submission ID" | search "Source File Name" IN (*OS.AIS.COF.DataOne.PROD*, *fgmulti_985440_GHR.COF.PROD.USPS.CARD*, *COF-DFS*) ] | sort -"EFG Delivery Time" | search "Source File Name" IN (*OS.AIS.COF.DataOne.PROD*, *fgmulti_985440_GHR.COF.PROD.USPS.CARD*, *COF-DFS*) | dedup "Submission ID"
Introduction In observability monitoring, static thresholds are used to monitor fixed, known values within application environments. If a signal goes above or below a static threshold or is within ... See more...
Introduction In observability monitoring, static thresholds are used to monitor fixed, known values within application environments. If a signal goes above or below a static threshold or is within or outside a specified range, an alert will fire. Static thresholds are quick to configure and can provide helpful insight into system stats, but there are downsides to using them too frequently. There’s a time and a place for static thresholds, and in this post, we’ll look at when to use static thresholds, when not to use them, and alternatives to static thresholds.  When to use static thresholds Static thresholds work well for situations where there are “known knowns” – the predictable cases that you can anticipate – and when there’s a static range of “good” and “bad” values. Here’s an example of a CPU utilization detector with an alert rule that triggers when CPU is above 90% for a 5 minute duration:    As a side note, adding durations on such alert rules is important to avoid over-alerting on transient spikes. Without a set duration, every CPU spike over 90% would trigger an alert, as we can see when we try to configure the same condition without a duration:  The estimated alert count for this alert rule is 11 alerts in 1 hour – aka too much alert noise.  Boolean conditions, like monitoring synthetic test failures, are a great case for static thresholds. Alerting when a synthetic check fails indicates issues with site availability that should be addressed immediately. This detector alerts on a synthetic test when uptime drops below the 90% static threshold:    You can also use static thresholds to monitor Service Level Objectives/Service Level Agreements/Service Level Indicators (SLO/SLA/SLI). If you have an SLO of 99.9% uptime, you’ll want to be alerted anytime your availability approaches that threshold. Here’s an SLO detector that alerts if latency passes the 99.99999% threshold:    While configuring static thresholds on SLOs is possible, within Splunk Observability Cloud it’s recommended to manage SLO alerts using error budgets.  Static thresholds are most appropriate when working with fixed and critical metrics with clear failure conditions, e.g. HTTP error response codes spiking, response time increasing for a certain period of time, error rates above a certain percentage for a certain amount of time. If you’re just starting out on your observability journey, using static thresholds is also a great way to capture baseline metrics and gain insight into trends so you can fine-tune and adjust your detectors and alerts.  When not to use static thresholds As we saw in the CPU detector above, alerting on static thresholds can lead to a lot of alert noise if not used correctly. To create good detectors, we need to alert on actionable signals based on symptoms, not causes (we have a whole post on How to Create Good Detectors).  In dynamic system environments that include autoscaling and fluctuating traffic, static thresholds might not indicate actual system failures. Setting static thresholds on pod CPU in a Kubernetes environment, for example, could indicate increased load, but might not indicate a problem – pods could autoscale to handle the increase just fine.  Note: monitoring the CPU static threshold combined with pod queue length could provide the additional context needed to create an actionable detector.  When there are periods of traffic spikes – Black Friday for e-commerce sites, lunchtime for a food delivery site – setting static thresholds can lead to an increase in false alarms. Applications with such variable traffic fluctuations might not benefit from alerting on static thresholds alone.  Dynamic resource allocation and variable usage patterns can make using static thresholds in isolation tricky, but thankfully, alternative approaches can help. Alternatives to static thresholds For the situations mentioned above, along with those where we might not know all of our application’s failure states (unknown-unknowns), alternative detector thresholding or a hybrid approach can work best. Alternatives include but definitely aren’t limited to:  Ratio-based thresholds – instead of alerting on 500 MB of memory used, use a threshold of 80% of total memory for a specific duration of time Combining static thresholds with additional context – high CPU + pod queuing or error rate or latency  Sudden change detection – alerts on sudden changes to specified conditions like number of logins, response time, etc.  Historical anomaly detection to baseline environments and alert on deviations from trends – alerting on latency that deviates from historical trends Combining out-of-the-box approaches with custom thresholds can help you build a resilient monitoring solution to keep your applications running smoothly and your users happy.  Wrap up  For predictable, critical metrics static thresholds are a great alerting option. For situations when static thresholds aren’t appropriate, there are thankfully additional solutions that can help. To start, check out Splunk Observability Cloud’s out-of-the-box alert conditions here and explore what might work best for your unique application environment. Don’t yet have Splunk Observability Cloud? Try it free for 14 days! 
Splunk Lantern is a Splunk customer success center that provides advice from Splunk experts on valuable data insights, key use cases, and tips on managing Splunk more efficiently. We also host Gett... See more...
Splunk Lantern is a Splunk customer success center that provides advice from Splunk experts on valuable data insights, key use cases, and tips on managing Splunk more efficiently. We also host Getting Started Guides for a range of Splunk products, a library of Product Tips, and Data Descriptor articles that help you see everything that’s possible with data sources and data types in Splunk. This month, we’re spotlighting articles that feature instructional videos from the Splunk How-To YouTube channel, created by the experts at Splunk Education. These videos make it easier than ever to level up your skills, streamline your workflows, and take full advantage of Splunk software capabilities. In addition to these highlighted articles, we’ve published a range of new content covering everything from optimizing end-user experiences to accelerating Kubernetes implementations. Read on to find out more.   Expert Tips from Splunk Education Have you explored the Splunk How-To YouTube channel? This great resource is packed with video tutorials that simplify complex concepts to help you get the most out of Splunk, created and curated by the experts on our Splunk Education team. Here at Lantern, we include these topics in our library so our users don't miss out on these vital tips. This month, we’ve published a batch of new articles that include hands-on guidance for mastering Splunk Enterprise 9.x, leveraging Enterprise Security 8.0 workflows, and more. Each article features an engaging video tutorial and a breakdown of what you can expect to watch. Here’s the full list: Installing Splunk Enterprise 9.x on Windows Follow these step-by-step instructions to deploy Splunk Enterprise 9.x on Windows systems with best practices. Installing Splunk Enterprise 9.x on Linux Follow this guide to deploy Splunk Enterprise 9.x in Linux environments. Using Enterprise Security 8.0 workflows Learn how to streamline investigations and utilize workflows effectively in Enterprise Security 8.0. Using risk-based alerting and detection in Enterprise Security 8.0 Enhance your security posture with risk-based alerting and detection capabilities. Enabling auto-refresh on the Analyst queue in Enterprise Security Discover how to enable auto-refresh for the Analyst Queue to optimize investigation efficiency. Searching investigation artifacts with the Analyst queue in Enterprise Security 8.0 Learn how to effectively search investigation artifacts using the Analyst Queue in Enterprise Security 8.0. Using SPL2 for efficient data querying Explore the powerful features of SPL2 for precise and efficient data querying.   We hope these videos inspire you to take your Splunk practices to the next level. Explore the articles, watch the videos, and let us know in the comments below if there are any topics you’d like to see featured next!   Observability in Action Effective observability is the key to ensuring seamless operations, reducing downtime, and optimizing performance across IT and business environments. This month, we’ve published several new Lantern articles that explore the latest in observability solutions and strategies to help you unlock actionable insights with Splunk. Accelerating an implementation of Kubernetes in Splunk Observability Cloud is a complete guide to kickstarting your Kubernetes journey in Splunk Observability Cloud. This guide offers best practices for performing a smooth implementation to monitor your containerized environments.   Accelerating ITSI event management explores how IT Service Intelligence (ITSI) can enhance event management processes with this practical guide, designed to help you identify, respond to, and resolve incidents more quickly. If you’re an AEM user, don’t miss Monitoring Adobe Experience Manager as a Cloud Service which explains how you can optimize end-user experiences with proactive response strategies. Finally, Using observability-related content in Splunk Cloud Platform shares how you can utilize observability-related content in Splunk Cloud Platform to maximize visibility and performance in cloud environments. These articles demonstrate the power of Splunk’s observability solutions in streamlining your operations and driving the business outcomes that matter most to you. Click through to read them, and let us know what you think!   Everything Else That’s New Here’s everything else we’ve published over the month: Using Edge Processor to mask or truncate cardholder data for PCI DSS compliance Using Edge Processor to filter out cardholder data for PCI DSS compliance Using the Splunk App for PCI Compliance Nagios Adobe We hope you’ve found this update helpful. Thanks for reading! - Kaye Chapman, Senior Lantern Content Specialist for Splunk Lantern
Hi all, Do any of you all run into issues where the bundle replication keeps timing out and splunkd.log references increasing the sendRcvTimeout parameter, in a previous ticket with support, they su... See more...
Hi all, Do any of you all run into issues where the bundle replication keeps timing out and splunkd.log references increasing the sendRcvTimeout parameter, in a previous ticket with support, they supplied a Golden Configuration that says that this value should be around 180. Based on: https://docs.splunk.com/Documentation/Splunk/9.4.0/Admin/Distsearchconf Under, 'classic' REPLICATION-SPECIFIC SETTINGS connectionTimeout = <integer> * The maximum amount of time to wait, in seconds, before a search head's initial connection to a peer times out. * Default: 60 sendRcvTimeout = <integer> * The maximum amount of time to wait, in seconds, when a search head is sending a full replication to a peer. * Default: 60   Should these two values be adjusted and kept in-sync? I am considering adding another 30 seconds to each. Or, if there is something else I should be verifying first, it would be helpful to get some direction here.