index=xyz sourcetype="access_combined" |rex "(GET|POST)\s+(?P<URL>[\w+_*\-*\/+\.*]+).*?\s+HTTP"|timechart limit=200 span=1h count(URL) by URL usenull=f useother=f
The need is to extract URI (excluding params) found in apache format logs. And get count of each type of URI and graph an area chart to show a stacked traffic pattern that shows how many of each URI.
Now, when i try to draw this chart over an 8 hour span, i seem to be missing data plots at certain time periods. For example 6am to 8am shows no data when the time period is set to -12h@h to 'now' . Now when i drill down to 6am to 8am , i get data plots.
I tried to see if this was because of the limit keyword and that splunk is not able to match any hits on those limit of 200 uris for 6am to 8am, but finds it during other timeframes. I think i am wrong in my understanding, but i cant find any other way to explain this. Same thing happens when i take out limit keyword .
Why would data plot work when you zoom into the timeframe but be omitted when you at a bigger scope of the time frame?
Wondering if anyone tried to solve the problem of figuring out your traffic mix and break down of URIs to categorize static vs dynamic content. Or just to understand what kind of requests task your webservers the most.
I am looking for ideas as going the regex route to capture URIs by pattern matches and trying to get a count has been unfruitful because of the sheer count of the type of unique URIs over a time period and what seems to be Splunk's physical limit on charts plotting such data. Even when specifying the limitOfSearch to something like half a million data points.
I suspect that if you take away the
useother arguments, that you'll get the 'missing' data back. usenull and useother are sometimes useful but in certain data they can make the overall behavior quite confusing.
limit=200 tells the timechart to pick the top 200 overall values, and only split by those. Everything else gets rolled up into "OTHER". From there the
useother keyword simply tells timechart whether or not you want the "OTHER" bar to be displayed or not.
So if a given subset of the timerange has relatively low volume compared to the rest of the range, and particularly if the overall top 200 values show no activity whatsoever during that subset of time, then you'll seem to get no data there. What's really happening is course is that there's supposed to be a big "OTHER" bar there, but you've told timechart you dont want any "OTHER" bars so it omits it.
Then, when you zoom in, timechart calculates a different overall 200 items for the split by logic, some of the 6am-8am data now does have values in the top 200 and now suddenly you see some data there.
Also - note that Splunk field names are case sensitive. So it's a little strange that you're extracting a field as 'url' in your rex command but then you're charting another field called "URL". Normally that rex command would not be necessary at all -- url is extracted by Splunk's default access_combined extractions just fine. And if all those nice default access fields (like clientip, status, method, referer) arent showing up by default, maybe there's something custom in the data that's able to throw the regex off slightly. If that's the case that's quite easy to fix and easier than using rex everywhere.
url vs URL was because of not using code tags when i posted the question. Fixed the formatting now.
When i remove usenull=f and useother=f the problem persists.
I verified that a tag /image/get-image exists in all timeframes and actually makes up over 30% of all our requests. So the question of unique urls is not an issue with the data as far as i can see, to cause gaps. And since span=1h, there should be a plot for all hourly time frames.
Thank you for the insights.