Getting Data In

Is there a way, instead of going on the server, to find out if the log files have gone stale?

ddrillic
Ultra Champion

My customer uses the following to monitor their hundreds of forwarders -

| metadata type=hosts index=<customer index> index=os index=perfmon 
| eval host=lower(host) 
| eval _time=recentTime 
| sort host, _time 
| stats latest(_time) as recentTime by host 
| eval LAST=strftime(recentTime,"%a %m/%d/%Y-%T %Z(%z)"), DAYS_AGO=round((recentTime-now())/86400,0)

When recentTime of a certain host is a couple of days old such as -5, they come to me saying, please bounce the server.

When I look at _internal all looks fine -

| metadata type=hosts index=_internal
| eval host=lower(host) 
| eval _time=recentTime 
| sort host, _time 
| stats latest(_time) as recentTime by host 
| eval LAST=strftime(recentTime,"%a %m/%d/%Y-%T %Z(%z)"), DAYS_AGO=round((recentTime-now())/86400,0)

Going on the server and we see that the monitored files are stale on the file system.

Bedside going on the server to look at the file system, is there a simpler way for me or the client to find out that the files are stale?

0 Karma
1 Solution

adonio
Ultra Champion

try this one out:

| tstats max(_time) as data_time where index=* by host | appendcols [| tstats max(_time) as internal_time where index=_* by host ]
| eval now=now()
| eval data_secondes_ago = now-data_time
| eval internal_data_seconds_ago = now-internal_time
| eval data_internal_gap = internal_time-data_time
| eval data_internal_gap_abs = abs(data_internal_gap)
| eval data_latency_true = if(data_secondes_ago>600, "1", "0") 
| eval internal_data_latency_true = if(internal_data_seconds_ago>600, "1", "0") 
| eval host_status = case(data_latency_true == 0 AND internal_data_latency_true == 0, "D. All Good", data_latency_true == 1 AND internal_data_latency_true == 1, "A. Check Server Down", data_latency_true == 1 AND internal_data_latency_true == 0, "B. No Data - Check Applications and Inputs", data_latency_true == 0 AND internal_data_latency_true == 1, "C. No internal data - Check disk size on host")
| eval now_human = strftime(now, "%c")
| eval data_time_human = strftime(data_time, "%c")
| eval internal_time_human = strftime(internal_time, "%c")
| sort host_status

hope it founds a new home 🙂

View solution in original post

adonio
Ultra Champion

try this one out:

| tstats max(_time) as data_time where index=* by host | appendcols [| tstats max(_time) as internal_time where index=_* by host ]
| eval now=now()
| eval data_secondes_ago = now-data_time
| eval internal_data_seconds_ago = now-internal_time
| eval data_internal_gap = internal_time-data_time
| eval data_internal_gap_abs = abs(data_internal_gap)
| eval data_latency_true = if(data_secondes_ago>600, "1", "0") 
| eval internal_data_latency_true = if(internal_data_seconds_ago>600, "1", "0") 
| eval host_status = case(data_latency_true == 0 AND internal_data_latency_true == 0, "D. All Good", data_latency_true == 1 AND internal_data_latency_true == 1, "A. Check Server Down", data_latency_true == 1 AND internal_data_latency_true == 0, "B. No Data - Check Applications and Inputs", data_latency_true == 0 AND internal_data_latency_true == 1, "C. No internal data - Check disk size on host")
| eval now_human = strftime(now, "%c")
| eval data_time_human = strftime(data_time, "%c")
| eval internal_time_human = strftime(internal_time, "%c")
| sort host_status

hope it founds a new home 🙂

DalJeanis
Legend

@adonio - You can't depend on all hosts being present in both lists, so appendcols will occasionally screw up the alignment. Better to use one of these two constructions for the aggregation

 | tstats max(_time) as data_time where index=* by host 
 | append [| tstats max(_time) as internal_time where index=_* by host ]
 | stats max(*) as * by host

OR

 | tstats max(_time) as internal_time where index=_* by host
 | join type=left host [ | tstats max(_time) as data_time where index=* by host]

Notice I've flipped the order of the files for the join, since there will presumably always be an _internal type record if there is any regular record, unless you use an extremely fine time range, but not always the reverse. Either way, the stats is probably the preferred option since it avoids the question of directionality completely.

adonio
Ultra Champion

thank you @DalJeanis for this important feedback! and for pointing out possible missalignment
super useful is the ... | stats <function>(*) as * by <field>
i think i prefer that approach but join will work too
there are times when you will see "real data" but no "internal data", one case, is low disk on machine where the forwarder is installed. Splunk will not generate its internal data but will keep on monitoring and send "real / live data"
@ddrillic, you are welcome to change integer in the following eval statements to answer your specific needs as the 600 number is an example only:

   | eval data_latency_true = if(data_secondes_ago>600, "1", "0") 
     | eval internal_data_latency_true = if(internal_data_seconds_ago>600, "1", "0")  

ddrillic
Ultra Champion

Thank you @adonio and @DalJeanis - much appreciated.

Get Updates on the Splunk Community!

.conf24 | Day 0

Hello Splunk Community! My name is Chris, and I'm based in Canberra, Australia's capital, and I travelled for ...

Enhance Security Visibility with Splunk Enterprise Security 7.1 through Threat ...

(view in My Videos)Struggling with alert fatigue, lack of context, and prioritization around security ...

Troubleshooting the OpenTelemetry Collector

  In this tech talk, you’ll learn how to troubleshoot the OpenTelemetry collector - from checking the ...