I have a search with a timechart command to fill a single value dashboard entry based on a count that comes from a DB2 query. Sometimes when the timechart is run, the first or the last value is missing, which throws off the trend on the dashboard. It seems to be intermittent and I haven't figured out how to fix it yet. Any help would be appreciated.
This is my search for the timechart:
index=abc sourcetype="sql" source="UNDERWTG" QUEUE_COUNT | timechart avg(QUEUE_COUNT) span=15m by source
This is the result I get:
_time UNDERWTG
2017-01-06 15:00:00
2017-01-06 15:15:00 43.000000
2017-01-06 15:30:00 42.000000
2017-01-06 15:45:00 42.000000
2017-01-06 16:00:00 42.000000
There is no data for the 15:00 timeframe.
When I run the search without the timechart this is the data:
1/6/17 4:02:45.588 PM
2017-01-06 16:02:45 QUEUE_COUNT="42"
host = SNDSPKHVY001 source = UNDERWTG sourcetype = sql
1/6/17 3:47:47.427 PM
2017-01-06 15:47:47 QUEUE_COUNT="42"
host = SNDSPKHVY001 source = UNDERWTG sourcetype = sql
1/6/17 3:32:45.373 PM
2017-01-06 15:32:45 QUEUE_COUNT="42"
host = SNDSPKHVY001 source = UNDERWTG sourcetype = sql
1/6/17 3:17:45.409 PM
2017-01-06 15:17:45 QUEUE_COUNT="43"
host = SNDSPKHVY001 source = UNDERWTG sourcetype = sql
1/6/17 3:02:45.401 PM
2017-01-06 15:02:45 QUEUE_COUNT="43"
host = SNDSPKHVY001 source = UNDERWTG sourcetype = sql
1/6/17 2:47:47.263 PM
2017-01-06 14:47:47 QUEUE_COUNT="43"
host = SNDSPKHVY001 source = UNDERWTG sourcetype = sql
The input is produced approximately every fifteen minutes, but not precisely. It's just after the quarter-hour, at plus-2 minutes and 45 to 47 seconds. That means the time gap between events is sometimes as low as 14:57 minutes and sometimes as large as 15:03 minutes.
Based on the data you posted, if you change your span to 20 minutes, there will always be at least one event in it, and sometime two. If you change it to 30 minutes, there will usually be two, but sometimes one or three. But that would be an odd way to show data that is basically every fifteen minutes.
One reasonable option would be to "spread" the data forward or backward over the fifteen minute span. In the following code, I take each record and copy it 5 minutes prior and 5 minutes later. This way, you can use a 15 minute span, take the average, and you will always have at least two, and sometimes four, records being averaged.
index=abc sourcetype="sql" source="UNDERWTG" QUEUE_COUNT
| fields _time QUEUE_COUNT
| eval mytimes=mvappend(_time,relative_time(_time,"-5m"),relative_time(_time,"+5m"))
| mvexpand mytimes
| search mytimes < now()
| eval _time=mytimes
| timechart avg(QUEUE_COUNT) span=15m
in the middle, you will notice the search where I killed the one potential record that might have been future-time.
I would check if the QUEUE_COUNT field values are properly recognized as numbers for avg calculations, so I would try this so see if it improves anything.
index=abc sourcetype="sql" source="UNDERWTG" QUEUE_COUNT=* | eval QUEUE_COUNT=coalesce(tonumber(QUEUE_COUNT),0) | timechart avg(QUEUE_COUNT) span=15m by source
I had an oddity where I needed eval method after the timechart as I was using it in a single number panel in dashboard studio and if you have no current time span value then it will display an odd "Missing property" error instead. I'm not sure if this would have been a problem in 2017 when @somesoni2 provided this answer, but it worked for me in this situation.
index=abc sourcetype=def "search filtering details"
| timechart span=15m count AS count
| eval count=coalesce(tonumber(count),0)
It's time dependent whether there is missing data or not. Sometimes its all there. I've had no trouble with the averaging.
The input is produced approximately every fifteen minutes, but not precisely. It's just after the quarter-hour, at plus-2 minutes and 45 to 47 seconds. That means the time gap between events is sometimes as low as 14:57 minutes and sometimes as large as 15:03 minutes.
Based on the data you posted, if you change your span to 20 minutes, there will always be at least one event in it, and sometime two. If you change it to 30 minutes, there will usually be two, but sometimes one or three. But that would be an odd way to show data that is basically every fifteen minutes.
One reasonable option would be to "spread" the data forward or backward over the fifteen minute span. In the following code, I take each record and copy it 5 minutes prior and 5 minutes later. This way, you can use a 15 minute span, take the average, and you will always have at least two, and sometimes four, records being averaged.
index=abc sourcetype="sql" source="UNDERWTG" QUEUE_COUNT
| fields _time QUEUE_COUNT
| eval mytimes=mvappend(_time,relative_time(_time,"-5m"),relative_time(_time,"+5m"))
| mvexpand mytimes
| search mytimes < now()
| eval _time=mytimes
| timechart avg(QUEUE_COUNT) span=15m
in the middle, you will notice the search where I killed the one potential record that might have been future-time.
I was searching for something unrelated, and came across this old post. For what it's worth, I have the same issue with data ingested every 15 minutes, which is occasionally early or late, which creates missing or extra entries on my timecharts.
I get around it by simply rounding _time to the nearest 15 minutes (900 seconds):
| eval _time=round(_time/900,0)*900
You are quite welcome!
that seems to work perfectly! thank you!
A shot in the dark but can you please check if it is more related to the time chosen to produce data rather than the actual data. For example for below data try choosing earliest as 14:48:00
and latest
as 16:03:00
, then the first row will be blank as there is no data in the first time slot of 14:48:00 to 15:00:00
as this grouping will land in of 14:45:00
even though "visually" the data of 14:47:47 exists
.
2017-01-06 16:02:45 QUEUE_COUNT="42"
2017-01-06 15:47:47 QUEUE_COUNT="42"
2017-01-06 15:32:45 QUEUE_COUNT="42"
2017-01-06 15:17:45 QUEUE_COUNT="43"
2017-01-06 15:02:45 QUEUE_COUNT="43"
2017-01-06 14:47:47 QUEUE_COUNT="43"
Same might be the case why the ending data might be missing when latest time chosen is maybe few minutes before the grouping data.
It's likely related to when the query is run but it's for a panel on a dashboard so I don't have a control over the when I just know it's often broken. I guess I was hoping someone might know how to fix it.
Your events are arriving every 15 minutes if you do not want to miss a time-bucket you can set span=30m instead of span=15m.
Also if you already have fixed source to "UNDERWTG", there is no need to performs stats by source.
Even if I change it to span=30m or span=60m I'm usually missing the first datapoint.
It looks like span=20m might be the fix I need.
Also, thank you for pointing out the redundancy of grouping by source.
I'm still having the issue with a 20m span. so that didn't fix it. I apparently just had to wait for a specific time to run it.
There is no event between 1/6/17 2:47:47.263 PM and 1/6/17 3:02:45.401 PM . So if you're running timechart with average on QUEUE_COUNT, there's no value for QUEUE_COUNT in your events during the above time frame, hence it is showing no values.
This could be the case every time you running the search that timechart misses some values due to missing data/events during a time frame.
the data is produced every 15 minutes so I'm not sure what you mean by missing data during a timeframe