Solved: Search command: bucket

sranga · ‎06-03-2010

Hi

We have a summary indexed search that puts events into buckets for a day. We then use that to get the top 5 values for a given day. This is the search we have:

... | bucket span=1d _time | sitop 5 field1 by _time

What we notice is that there are two buckets created within a single day. One has a 12:00 AM value and the other has a 5:00 PM value. We just need all of the events to be grouped under one value. A sample result is given below.

_time                       field1  count   percent  
5/28/10 12:00:00.000 AM value1  3406    26.442046  
5/28/10 12:00:00.000 AM value2  2506    19.455011
5/28/10 12:00:00.000 AM value3  1034    8.027327
5/28/10 12:00:00.000 AM value4  617 4.790001
5/28/10 12:00:00.000 AM value5  609 4.727894
5/28/10 5:00:00.000 PM  value6  61  21.478873
5/28/10 5:00:00.000 PM  value7  39  13.732394
5/28/10 5:00:00.000 PM  value8  33  11.619718
5/28/10 5:00:00.000 PM  value9  25  8.802817
5/28/10 5:00:00.000 PM  value10 21  7.394366

Are we missing something? Thanks for your help.

Ranga

sideview · ‎06-03-2010

interesting. The only thing i can think of is that you're using distributed search and maybe not all servers are in the same timezone? Since the bucketing logic will (i think) be applied on the remote peer, it would calculate the day boundaries differently, and then the central search-head would get back all the buckets and apply its day boundaries and you might end up with weird results like this..

View solution in original post

steveyz · ‎06-04-2010

This is a bug. The best workaround right now would be to insert the localop command before the bucket, as lowell suggested.

Lowell · ‎06-03-2010

Side note: It doesn't look like your limit (5) is honored when you are using sitop instead of top.

sideview · ‎06-03-2010

interesting. The only thing i can think of is that you're using distributed search and maybe not all servers are in the same timezone? Since the bucketing logic will (i think) be applied on the remote peer, it would calculate the day boundaries differently, and then the central search-head would get back all the buckets and apply its day boundaries and you might end up with weird results like this..

sideview · ‎06-04-2010

Oh nice. Yea that's probably it then. Interestingly enough this isnt supposed to be an issue - the distributed search code actually sends serialized timezone info over the wire specifically so that the bucketing should be performed consistently... Sounds like its worth a bug + support case..

sranga · ‎06-03-2010

Thanks. We do have a distributed search setup and there is a timezone mismatch between the two servers. This is most likely causing the issue we are seeing.

sideview · ‎06-03-2010

Well if you havent set up splunk on any other machine and set up searches to distribute between the N machines, then there's no distributed search. in which case my idea will be a red herring. But to check you would go into "Manager > Distributed Search > Search Peers" and see if there are any peered servers.

Lowell · ‎06-03-2010

You can use localop to "prevent subsequent commands from being executed on remote peers." Not sure if that's your issue or not, but I guess this could help you find out.

sranga · ‎06-03-2010

Thanks. Is there anyway to check if distributed search is used or should I contact the admin to get this information? If a distributed search is used, can we prevent it through a configuration or command?

Search command: bucket

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Introducing ITSI 5.0: Unified Visibility and Actionable Insights

Inside Splunk Agent Observability: Understanding Agent Behavior, Tokens & Costs

From Data to Insight: Announcing the Winners of the Splunk Dashboard Contest

Join the Conversation