Splunk Search

Bucket/Timechart and Dedup

jbp4444
Path Finder

I'm trying to plot total load-avg vs number of processors in a cluster (i.e. how loaded is the system). The following basically works:

numproc OR loadshort | transaction xid | bucket span=10m _time | timechart span=10m sum(numproc) sum(loadshort)

Except we occasionally see multiple data items posted in a given 10min window -- e.g. {host,load1,numproc,time1} and {host,load2,numproc,time2} both land in the same time-bucket. The above commands adds in the numproc value twice, which obscures the real load on the system.

Is there a way to dedup the data after its been bucketed? or, maybe said another way, to dedup the data within a single bucket?

Tags (3)
1 Solution

jbp4444
Path Finder

Looks like dedup might work after all -- I didn't realize you could dedup based on more than one field:

numproc OR loadshort | transaction xid | bucket span=10m time | dedup host time | timechart span=10m sum(loadshort) sum(numproc)

Since bucket discretized the timestamps, the {host,time} pairs are duplicates and dedup can take care of them (time should be underscore-time).

View solution in original post

jbp4444
Path Finder

Looks like dedup might work after all -- I didn't realize you could dedup based on more than one field:

numproc OR loadshort | transaction xid | bucket span=10m time | dedup host time | timechart span=10m sum(loadshort) sum(numproc)

Since bucket discretized the timestamps, the {host,time} pairs are duplicates and dedup can take care of them (time should be underscore-time).

jbp4444
Path Finder

Thanks for the quick reply gkanapathy -- that's definitely in the right direction. But there are multiple hosts producing the {numproc,loadshort} data, and I want to sum each item across all those machines. My understanding is that 'first' would give only one value from one host.

Maybe some combination of 'first .. by host' then a separate summation command?

0 Karma

gkanapathy
Splunk Employee
Splunk Employee

Might be easiest to just use first() instead, which will give you the most recent numproc in each bucket. I'll assume numproc doesn't change, though if it did, you might just use avg():

numproc OR loadshort | transaction xid | bucket span=10m _time | timechart span=10m first(numproc) sum(loadshort)
Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

Quantify Your Splunk Investment Impact: Introducing Savings Metrics to Value Insights

Building on the foundation established in our initial Value Insights releases, we are introducing the Savings ...

Event Series: Telemetry Pipeline Management

Balancing Scale and Spend: Gaining Control Over High-Volume Metrics in Splunk Observability Cloud As ...

Kick the Tires Before You Commit: A Hands-On Tour of the Splunk Observability Cloud ...

Evaluating an enterprise observability platform usually goes like this: fill out a form, get a free trial with ...