Hi,
I am having some difficulty in locating information to help me to create a scatter plot (over time) of a data set that I currently am reporting off of.
Sample log entry:
%<---snip---
2010-04-19 20:10:04,658 ************/****/********/pc/mform_proc INFO InetMailSession.sendMessage() took 67ms
2010-04-19 20:10:06,952 ************/****/********/pc/mform_proc INFO InetMailSession.sendMessage() took 83ms
2010-04-19 20:10:18,562 ************/****/********/pc/mform_proc INFO InetMailSession.sendMessage() took 76ms
2010-04-19 20:10:22,864 ************/****/********/pc/mform_proc INFO InetMailSession.sendMessage() took 200ms
2010-04-19 20:10:24,792 ************/****/********/pc/mform_proc INFO InetMailSession.sendMessage() took 74ms
2010-04-19 20:10:26,460 ************/****/********/pc/mform_proc INFO InetMailSession.sendMessage() took 80ms
%<---snip---
The data that I'm particularly interested in is the last field, a response time in ms. Right now, I have a timechart plot, with averages, etc... and now would like to include a scatter plot of the distinct values over time, where "Response time" is in milliseconds on the y-axis and Date/Time is on the x-axis.
Make sense?
Any information is greatly appreciated!
thanks, -mt
You can use
... | timechart values(x)
You should use the timechart command:
http://www.splunk.com/base/Documentation/latest/SearchReference/Timechart
If your field is called numberfield:
my search query | timechart max(numberfield)
You can use max() or min() as there should be only one value per event sampled.
As for a specific use case, let's assume you have a network device that logs thruput to a field called thruput:
host=network_device | timechart max(thruput)
If you had multiple network devices, you can group by the network device if you search by the sourcetype:
sourcetype=network_log_file | timechart max(thruput) by network_device
I recommend if you can, keeping it simple as in the following:
| timechart max(response_time) min(response_time) avg(response_time)
The reason being that the FlashChart module in the UI has a limit of some number of rows past which it will truncate, and also that the performance of the flash pulling down that much data at all can make for a clunky experience.
The other reason for this, is that we changed some things that made it possible to do scatter charts where time was NOT the x-axis, and in so doing made it quite difficult to actually do the cases where time IS the x-axis. (timechart is your friend).
Go to the "advanced charting" view, and run a search like:
index=_internal source=*metrics.log group=per_sourcetype_thruput series=splunkd
| rename _time as time | fields time eps
over the last 60 minutes.
(the renaming of _time to time is to dodge a bug where 'scatter' charts with time series data are always blank)
-- change 'chart type' to 'scatter'. That will show you an honest-to-god scatter chart where time is the x-axis and eps is the y-axis.
Unfortunately the values on the time axis are now seconds since 1970.
(If on the other hand you ever wanted to do a scatter chart where the x-axis is numeric, this actually works quite well.)
index=_internal source=*metrics.log group=per_sourcetype_thruput series=splunkd
| fields kbps eps
I think that the "advanced charting view" sideview is mentioning is deprecated? Please correct me if I'm wrong.
| eval time=_time | table time latency
And you need to select scatter in graph options.
That looks like a real time scatter, except that the times are written in epoch time.
Interesting.
Assume you want to track events that are supposed to run every minute of the day. You could sum them by hour (by event type) and should get about 60 events/hr per event type. Then you could represent each hour as a colored scatter point as either green (58-66), yellow (50-58), red (<50) or purple (>66).
If so, you could monitor 20 different events, setting a specific 'Y-axis' value for each different type so they appear horizontally in parallel. Using a scatter time basis that goes back to 1970 isn't very realistic. With scatter, can you specify earliest as '-24h' or 'today?
This will do it:
sourcetype=mydata | timechart bins=4000 list(response_time) as response_time | mvexpand response_time
Using bins=4000
will collapse your time range on the x-axis into up to 4000 equal discrete intervals, so note that this may move your x-axis timestamps. For most charting purposes, though 4000 bins will look right and will be close enough. You can use up to 50,000 bins if needed.
This is not really a time scatter. The abscissa is not time. The abscissa is time chronology.
So for these values:
00:01
00:02
00:10
You would have abyssa 1, 2, 3.
The graph doesn't highlight that 00:02 is closer to 00:01 than it is from 00:10.
NOTE: if you set bins this high, and you're in 'line' chart, make sure that either 'x-axis' > 'display markers' is 'yes', OR that 'Null Values' is set to 'zero' or 'connect'. Otherwise your chart will often look empty and you'll be confused.