Getting Data In

"Lost Log" Search?

Communicator

So we know about lost forwarders, but how about lost logs? I recently discovered that some of my Windows systems were no longer forwarding Windows system and/or security logs. As I understand it, if a system shuts down unexpectedly the marker files in %SPLUNK_HOME%\var\lib\splunk\persistentstorage\WinEventLog are sometimes corrupted. When this happens Splunk just refuses to look at the logs and generates a message n splunkd.log: Failed to initialize checkpoint for Windows Event Log channel 'Security'

This is listed as fixed in the release notes for 4.1.4, but until all of the forwarders can be upgraded, can anyone suggest a “lost log” search?

Thanks.

Tags (1)
0 Karma
1 Solution

Splunk Employee
Splunk Employee

There's a clever answer for the general problem, I'm sure, but I'm not very clever at the search language.

My suggestion would be:

index=_internal "Failed to intialized checkpoint for Windows Event Log" | stats count by host

followed by clickthroughs to investigate.

I might even create a field extraction to pull out the channel name, and split out by that.

View solution in original post

0 Karma

Splunk Employee
Splunk Employee

Typically Splunk cares more about eating what's generated in real time, than worrying about what isn't coming in.  However, we capture a data on things that occur in Splunk.  I can see a use case where a customer might want to know if a log is NOT spitting out data, as it might indicated a failure of the application thats generating the log.. (no log = no process running).  I can see monitoring the "non-capture" of log data on an exception basis rather than the rule.  So what you might want to do is ask splunk: "Let me know any (or specfic) sources where the most recent event isn't in the last 5 minutes -- for example.  

  • | metadata sources  | eval gap_minutes=round((now()-recentTime)/60) | eval currenttime=now() | sort -recentTime | convert ctime(currenttime) ctime(recentTime) | fields + currenttime, recentTime, source, gap_minutes

It says.. Give me metadata about captured sources, calclluate a gap between any sources "recentTime" and NOW in MINUTES and then round to nearest minute (up or down depending on value).., then sort by recentTime decending, then convert the epoch time to human readable, then just display the fields i care about.

One of the great things about splunk is you have access to the language we speak to the engine.  So what would we do if i wanted to see "sources that haven't reported any events".. add " | search gap_minutes>5.  Then, save it.. schedule it for every 5 minutes and if any events come back.. walla.. email.

One thing I ran in to is that I had negative numbers in my gap.  That means i have "future events" -- gotta deal with that, as i didn't know they were there -- likely a timezone issue.  That being said, any source that has a positive number means there's a gap between now and the last time we saw an event.

That "| metadata" search command can be used with "sources, hosts, sourcetypes" as well.

Communicator

Closer to what I was hoping for, but there is no way to determine which host the missing source is on, and with so many identical source names it would be a very rare log that would even show up.

| metadata hosts, sources |search source=WinEvent* | eval gap_days=round((now()-recentTime)/60/60/24) | . . .

would be ideal, but it doesn't seem possible -- at least with metadata.
For now the best answer seems to be forwarding the splunkd logs to the indexer and running Josh's search.

Thanks.

0 Karma

Splunk Employee
Splunk Employee

There's a clever answer for the general problem, I'm sure, but I'm not very clever at the search language.

My suggestion would be:

index=_internal "Failed to intialized checkpoint for Windows Event Log" | stats count by host

followed by clickthroughs to investigate.

I might even create a field extraction to pull out the channel name, and split out by that.

View solution in original post

0 Karma

Communicator

Splunkd logs are not forwarded to the indexer, but I may have to re-evaluate that decision.

0 Karma