Whilst leaving a Splunk 6 search page open tailing incoming syslogs (with the default *
search query), I realised it wasn't tailing in realtime. I investigated the timeframe options and noticed 'All time (realtime)'; I tried to select it but found no results displayed. What I did get was a progress spinner in the left-hand corner below the search box and an empty search results area!
I left the page open for an hour or so and when I returned I could see a small "Invalid SID" error message below the search box (with no results).
I've only been compiling syslogs from a dozen or so devices for approximately one week, and Splunk's running on a dedicated Linux VM with a huge amount of CPU and RAM - it surprised me that realtime result display doesn't work. The install is essentially OOTB default on 64-bit Ubuntu (from the .deb), aside from the addition of the Modular SNMP app (currently not functional).
As far as I know "All time (real-time)" does not backfill results from historical data. It essentially looks only at incoming data as of the moment when the search is started. The key difference to windowed real-time searches is that results are not discarded in searches using the "All time (real-time)" time range.
As an alternative you could probably use a huge windowed real-time range, such as earliest=rt-100y latest=rt
, this is probably not desirable performance-wise.
If this search logic is indeed so, it's the reason for my concern - why are no search results showing, even with just a single wildcard search term? There's a steady stream of debug-level syslogs coming in and these can be seen in historical searches, I can't understand why nothing's showing realtime.
While the dedicated CPU and RAM is nice, the real question is "what is the speed of your disk?" If you are running on a virtual machine, that could be a significant problem.
But you say "Wait! I am looking at real-time! It shouldn't be looking at the disk!" Yes, BUT when you ask for "All time (realtime)" Splunk will try to back-fill the time range.
Let me give an example that might be clearer: if you choose the timerange "real time - 5 minute window", Splunk will first locate the data for the past 5 minutes from disk and fill the window. Going forward, Splunk then displays matching data as it arrives.
So you actually asked Splunk to retrieve EVERYTHING (*
) that it has over ALL TIME and then look at the data as it arrives. Even in the biggest Splunk systems, on dedicated hardware, this is probably a performance-killing search.
I suggest some alternatives:
To see how much data is arriving (without viewing the actual data), you can look at the Splunk internal logs. There are several interesting searches. Here is a sample:
index=_internal component=Metrics group=per_index_thruput
| eval index=series | timechart span=5m sum(kb) by index
The Splunk-On-Splunk app is a great tool (and free) that will also help you examine how much data is flowing into Splunk.
To look at the data that is currently arriving, I suggest that you don't search for *
. Try looking at a subset of the data. And if you must look at the data in real-time, choose a 30-second window.
And ta for the SoS app, that also looks very interesting and I'm now off to install. I'm mindful of how Splunk will scale and how we're going to need to structure a Splunk deployment as we build up the cluster (we're not yet fully decided on using just Splunk for aggregation and indexing, SoS looks like it'll be very useful.)
At the moment it's one of a handful of VMs running on a fairly powerful setup (10GigE from backplanes, GigE virtual interfaces with an EMC VNX5500 backing it all up) so I'm confident I'm not taxing the storage I/O 🙂
Your power search looks useful, definitely bookmarking for future use. As expected, there's not much logging going on just at the moment (not a lot pointed to the IP): _internal
is still hovering between 83
and 95
and the half-hourly _audit
intervals are showing 1.49707
. As these are all KB
I'm not overly concerned yet 😉 (And I've only been logging for a week...!)