It’s 3 pm on a Friday, and your application is running slower than it should be. Support tickets are coming in and you want to solve the issue fast – for your users and for your weekend. With so many capabilities for monitoring, troubleshooting, and performance analysis, which tool will give you the insight you need right now?
Thankfully, Splunk Observability Cloud quickly gives you the complete visibility you need to optimize performance. AlwaysOn Profiling helps identify bottlenecks before they become incidents or support tickets. And now, Call Graphs – the battle-tested feature from Splunk AppDynamics – helps troubleshoot in-progress performance issues for faster resolution. But which one do you use and when? Let’s talk it out.
AlwaysOn Profiling: Your Early Warning System
AlwaysOn Profiling continuously samples your application’s call stacks, taking snapshots every 10 seconds (by default) for CPU and periodically for memory. These get aggregated into flame graphs where the width of each box shows how much time your code spends in each method:
AlwaysOn capabilities are perfect for:
Real-World Example
Your application has been gradually getting slower. No single transaction is terrible, but everything just seems a bit sluggish. You open profiling and immediately see 40% of CPU time is spent in a JSON serialization method that someone “optimized” last month. Mystery solved in minutes, not hours.
Performance Impact
AlwaysOn Profiling is not on by default and requires explicitly enabled. This intentional setup ensures you’re aware of the additional monitoring.
While AlwaysOn Profiling is designed for continuous operation with low overhead (typically single-digit percentage impact), it’s still sampling your application’s call stacks every 10 seconds. Test out the impacts in your own development environment to understand the specific impacts before launching it in Production.
Call Graphs: Your Incident Detective
Call Graphs show you the exact method-by-method execution path of individual requests. Unlike profiling’s aggregate view, this is transaction-specific detail. You see the exact sequence of method calls, how long each took, and which external calls were made.
Call Graph capabilities are perfect for:
Real-World Example
A customer reports checkout took 45 seconds. You find their trace, pull up the Call Graph, and there it is: a third-party payment API is taking 43 seconds to respond. You see the exact method that made the call. Ticket routed to the fixing team, case closed.
Performance Impact
Like AlwaysOn Profiling, Call Graphs require explicit configuration in your APM setup.
Call Graphs capture detailed method-level timing for each traced request, which can add overhead if applied to 100% of transactions. The default 5% sampling rate strikes a balance between diagnostic capability and performance impact and typically captures enough data for troubleshooting while keeping overhead minimal.
Once configured, you can tune method execution thresholds (default filters out methods under 10ms) and exclude certain packages to reduce overhead in your specific environment and focus on the bottlenecks that are meaningful to you.
Which Tool When?
Scenario | Your Go-To Tool | Why? |
“Our app has been getting slower over the past week” | AlwaysOn Profiling | Shows trends over time |
“This customer can’t complete checkout” | Call Graphs | Traces exact transaction |
“We need to optimize before Black Friday” | AlwaysOn Profiling | Proactively identifies bottlenecks |
“The CEO just hit a 30-second page load” | Call Graphs | Immediate transaction detail |
“Did our deployment slow things down?” | AlwaysOn Profiling | Compares across deployments |
Better Together
Even though these two tools might sound similar, they really work better in conjunction. When investigating an issue, here’s a typical flow that takes advantage of the capabilities provided by both AlwaysOn Profiling and Call Graphs:
Wrap Up
While they might sound similar on paper, AlwaysOn Profiling prevents problems by continuously monitoring performance trends and catching regressions early – it’s a proactive tool. Call graphs solve problems by showing exactly what happened in slow transactions – it’s a reactive tool. You definitely need both to prevent fires before they start and to quickly put them out after they do.
Ready to level up your performance game? Try Splunk Observability Cloud (including AlwaysOn Profiling and Call Graphs) free for 14 days.
Resources
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.