Hi, I'm trying to create a chart of results over time, however the chart only charts the first 1000 results. I'm using the following search over 1 day:
index=prod sourcetype="websphere:nativestdouterrlog" | chart avg(exclusiveaccessms) over _time
This returns about 6500 results across the day, and I need to create a line chart from that. I can see all the results in the table, however the chart stops at the 1000th event.
I've also tried using the table & timechart functions, but both have the same problem.
Is there a limit somewhere which I can change to correct this?
Thanks Ashley
index=prod sourcetype="websphere:nativestdouterrlog" | chart avg(exclusiveaccessms) over _time
will only graph 10,000 rows. But the problem here is that chart by itself will not do any bucketing by time. So if you look at the rows, the timestamps are the same timestamps from the events -- these rows are just events really. So all that's happening is the default limit on the chart command is kicking in (and this is a good thing). (If for some reason you really wanted to use the chart command or stats command over _time yourself instead of just using timechart, you'll have to manually bucket the _time values with the bucket
command.)
What you want to run is:
index=prod sourcetype="websphere:nativestdouterrlog" | timechart avg(exclusiveaccessms)
I'm not sure what problem you were running into when you tried it, but that will work fine and work properly well past 10,000 rows and up into millions of events and beyond. Granted timechart will only return a number of rows far less than the number of events, but that's the point - timechart buckets millions of events into aggregated buckets of time and then graphs the aggregate statistics, not the raw events.
it is possible but not likely that your etc/system/local/limits.conf to see if someone set an obscure limit on timechart or on the whole system somehow, but this would have been a deliberate action taken by some admin in your deployment.
UPDATE:
I still strongly recommend some form of approach where you bucket the data. Graphing 100,000 rows in the flash chart just isnt a story that ends well.
1) per my comment below, you might want to explore using max(exclusiveaccessms)
in addition to avg(axclusiveaccessms)
2) You might want to explore the span
and bins
arguments to timechart, because you can use this to increase the granularity of the timechart's buckets to whatever you need. For instance:
<your search> | timechart span=1h min(exclusiveaccessms) avg(exclusiveaccessms) max(exclusiveaccessms)
Will give you min, avg and max for every hour. And this example:
<your search> | timechart bins=1000 min(exclusiveaccessms) avg(exclusiveaccessms) max(exclusiveaccessms)
will make much more granular buckets (i think the default is around 250 or 300), but it will still determine the exact number of buckets from the timerange it's given.
Method to bump the value on a chart basis for simpleXML on 6.2
< option name="charting.data.count" >9999 </ option >
I've sort of found a solution. If you are using advanced XML to create the dashboard/chart, there is a param called 'maxResultCount' which tells it how many results can be plotted per series. This sits under the FlashChart module. The strange thing is that the Splunk documentation on this says the default is 250, not 1000.
Be careful though, the documentation says changing it can cause unexpected UI behaviour, and I've noticed that when you use it with a large number your browser starts using heaps of memory (I had it up to 500MB, and it wasn't responding very well).
Here's an example of the module in my XML, you can see where the maxResultCount parameter sits:
<module name="HiddenPostProcess" layoutPanel="panel_row1_col1" group="JVM Heap Usage" autoRun="True">
<param name="search">table _time T_Total T_Used N_Total N_Used</param>
<param name="groupLabel">JVM Heap Usage</param>
<module name="ViewstateAdapter">
<module name="JobProgressIndicator">
<module name="EnablePreview">
<param name="enable">True</param>
<param name="display">False</param>
<module name="HiddenChartFormatter">
<param name="charting.chart">line</param>
<param name="charting.axisTitleX.text"></param>
<param name="charting.axisTitleY.text"></param>
<param name="charting.chart.nullValueMode">connect</param>
<module name="FlashChart">
<param name="width">100%</param>
<param name="height">400px</param>
<param name="maxResultCount">10000</param>
</module>
<module name="ViewRedirectorLink">
<param name="viewTarget">flashtimeline</param>
</module>
</module>
</module>
</module>
</module>
</module>
Indeed, raising this to 10000 rows is a bad idea, and you're not really getting anything in return; there are not 10,000 pixels in your display so the herculean effort of pulling down all the data and charting it in Flash is wasted. I still recommend allowing timechart to bucket the times, just use min/max/percentiles to better effect (see my answer here for more details)
No, I didn't need to restart anything, just update the dashboard through the Manager and it should pick it up straight away. Not sure why it's not working for you???
Hi again, i still don't get the results. when changing the value the charts are still showing only 1000 results. 😞
No restart is needed when changing this?
Thanks in advance, Alex
Hi, i will try it and get back with the result. But the heaps kind of scare.
Thanks for the update.
Hi Ashley i'm currently with the same problem. Can't show more than 1000 events on the chart... for istance if i want to show 30 minutes by second i get 30X60= 1800.
It will only show 1000 events... 😞
Any new ideas on this?
Hey Alex, I've sort of found a solution. If you are using advanced XML to create the dashboard/chart, there is a param called 'maxResultCount' which tells it how many results can be plotted per series. This sits under the FlashChart module. The strange thing is that the Splunk documentation on this says the default is 250, not 1000.
Be careful though, the documentation says changing it can cause unexpected UI behaviour, and I've noticed that when you use it with a large number your browser starts using heaps of memory (I had it up to 500MB, and it wasn't responding very well).
Hi Ashley i'm still investigating but it seems the only way to do it, is somehow changing that limitation of 1000.
But if i find an answer i'l post ir here.
Cheers,
Alex
Hi Alex, No I haven't been able to get around this, it's still a problem for me.
I've had to resort to using the 'timechart bins=1000' to try to get as close as possible, but even that isn't great because it only really does rounded numbers and nothing in between (ie, it goes 1m, 5m, 30m, 1h, 1d), and won't do anything between 1h & 1d.
My dashboards have a TimeRangePicker on them so the users can select their own timerange, but because of this the scale and look of the chart changes dramatically.
If you find anything, please let me know.
Cheers,
Ashley
index=prod sourcetype="websphere:nativestdouterrlog" | chart avg(exclusiveaccessms) over _time
will only graph 10,000 rows. But the problem here is that chart by itself will not do any bucketing by time. So if you look at the rows, the timestamps are the same timestamps from the events -- these rows are just events really. So all that's happening is the default limit on the chart command is kicking in (and this is a good thing). (If for some reason you really wanted to use the chart command or stats command over _time yourself instead of just using timechart, you'll have to manually bucket the _time values with the bucket
command.)
What you want to run is:
index=prod sourcetype="websphere:nativestdouterrlog" | timechart avg(exclusiveaccessms)
I'm not sure what problem you were running into when you tried it, but that will work fine and work properly well past 10,000 rows and up into millions of events and beyond. Granted timechart will only return a number of rows far less than the number of events, but that's the point - timechart buckets millions of events into aggregated buckets of time and then graphs the aggregate statistics, not the raw events.
it is possible but not likely that your etc/system/local/limits.conf to see if someone set an obscure limit on timechart or on the whole system somehow, but this would have been a deliberate action taken by some admin in your deployment.
UPDATE:
I still strongly recommend some form of approach where you bucket the data. Graphing 100,000 rows in the flash chart just isnt a story that ends well.
1) per my comment below, you might want to explore using max(exclusiveaccessms)
in addition to avg(axclusiveaccessms)
2) You might want to explore the span
and bins
arguments to timechart, because you can use this to increase the granularity of the timechart's buckets to whatever you need. For instance:
<your search> | timechart span=1h min(exclusiveaccessms) avg(exclusiveaccessms) max(exclusiveaccessms)
Will give you min, avg and max for every hour. And this example:
<your search> | timechart bins=1000 min(exclusiveaccessms) avg(exclusiveaccessms) max(exclusiveaccessms)
will make much more granular buckets (i think the default is around 250 or 300), but it will still determine the exact number of buckets from the timerange it's given.
sounds like you want to graph timechart avg(exclusiveaccessms) max(exclusiveaccessms) then?
Timechart can graph several different stats on the same access. I find myself commonly doing timechart min(foo) avg(foo) max(foo), which makes for a little visualization. or timechart min(foo) perc33(foo) perc67(foo) max(foo) and so on and so forth...
Hi Nick, thanks for your response. I understand that this command is returning all the events, as I don't actually want to average them. I'm trying to make graphs of JVM memory usage and we need to be able to accurately see when there's spikes, so when we show the graph over a longer period (ie 7 days) the averaged data becomes invalid.
I don't think the limit is to do with the chart/timechart commands themselves, because I get the same result if I use the table function. When I perform the search I can see all the rows being returned, it's just that the flash chart doesn't display them all.