Splunk Search

How to get dc and then sum of field?

Sathiya123
Explorer
<search>
| eval vm_unit=case(vmSize="Standard_F16s_v2",2,vmSize="Standard_F8s_v2",1,vmSize="Standard_F4s",0.5,vmSize="Standard_F2s_v2",0.25) | timechart span=1h dc(vm_name) sum(vm_unit) as USED_VMS

Looking for the sum of vm_unit for distinct VM's by the hour. But it considers all the VM's instead of distinct VM's.

Labels (1)
Tags (1)
0 Karma
1 Solution

PickleRick
SplunkTrust
SplunkTrust

To be fully honest, I'm not sure what you want to achieve and what your data looks like. I'll assume that your events have vm_name and vm_unit fields and vm_unit is always the same for given vm_name and you want to count number of disctinct vm_names and sum of corresponding vm_units. Is that right?

<your_base_search>
| bin _time span=1h
| stats values(vm_unit) as vm_unit by _time vm_name
| stats dc(vm_name) sum(vm_unit)  by _time

You can also probably use dedup but I don't like that command and don't use it much so don't remember the syntax 😄

View solution in original post

ITWhisperer
SplunkTrust
SplunkTrust

Is this what you were after?

<search>
| eval vm_unit=case(vmSize="Standard_F16s_v2",2,vmSize="Standard_F8s_v2",1,vmSize="Standard_F4s",0.5,vmSize="Standard_F2s_v2",0.25)
| bin _time span=1h
| stats values(vm_unit) as vm_unit by _time vm_name
| stats dc(vm_name) as vm_count sum(vm_unit) as USED_VMS by _time

gcusello
SplunkTrust
SplunkTrust

Hi @Sathiya123,

if you want the sume of vm_unit for each VM, the solution fom @woodcock is the correct one.

If instead (as it seems from yur example) you want both the sum of VMs and the count of distinct VMs for each time unit, you could use stats instead timechart, because timechart permits to display only one value for each time unit, something like this:

<search>
| eval vm_unit=case(vmSize="Standard_F16s_v2",2,vmSize="Standard_F8s_v2",1,vmSize="Standard_F4s",0.5,vmSize="Standard_F2s_v2",0.25)
| bin span=1h _time
| stats dc(vm_name) AS vm_name_count sum(vm_unit) AS USED_VMS BY _time

Ciao.

Giuseppe

0 Karma

Sathiya123
Explorer

Thanks, @gcusello for replying.

Your query provides the same output as mine.

The issue here is that events got duplicated in our Splunk index for some reason.

In a given hour,  there should not be two events for the same vm_name.

In order to solve the duplicate issue I am using dc(vm_name) thinking that sum(vm_unit) will avoid the duplicate entries. But in my case sum(vm_unit) includes the duplicate entries.

For e.g. consider all my vm entries are duplicated twice.

_time count(vm_name) sum(vm_unit) ==> _time 120 200

My expectation is 

_time dc(vm_name) sum(vm_unit) ==> _time 60 100

Any help is highly appreciated.

 

0 Karma

PickleRick
SplunkTrust
SplunkTrust

If you do a single stats command with two different aggregation functions, they are performed separately. So dc() counts distinct values and sum() sums _all_ values. That's how it works.

0 Karma

Sathiya123
Explorer

@PickleRick Thanks for your reply.

I agree with your statement.

But how to form the Splunk query to get my expected results?

0 Karma

PickleRick
SplunkTrust
SplunkTrust

To be fully honest, I'm not sure what you want to achieve and what your data looks like. I'll assume that your events have vm_name and vm_unit fields and vm_unit is always the same for given vm_name and you want to count number of disctinct vm_names and sum of corresponding vm_units. Is that right?

<your_base_search>
| bin _time span=1h
| stats values(vm_unit) as vm_unit by _time vm_name
| stats dc(vm_name) sum(vm_unit)  by _time

You can also probably use dedup but I don't like that command and don't use it much so don't remember the syntax 😄

ITWhisperer
SplunkTrust
SplunkTrust

This made me smile - the accepted solution looks remarkably similar to my solution posted 3 hours prior to this 😁

PickleRick
SplunkTrust
SplunkTrust

True. I must say I didn't read into your answer because I didn't understand the question at that moment XD

0 Karma

Sathiya123
Explorer

@ITWhisperer  Apologise, somehow I missed your reply, otherwise, I would have replied back on that thread.

Thanks again. Given thumps up to your reply.

0 Karma

Sathiya123
Explorer

Thanks, @PickleRick for your reply.

I think your query works. Let me confirm.

Basically, I want to sum(vm_unit) on an hourly basis, by ignoring duplicate entries in an hour.

 

0 Karma

PickleRick
SplunkTrust
SplunkTrust

That's where your wording is a bit confusing. Because - as far as I understand - you can have several vm_names with the same vm_unit. Otherwise it kinda doesn't make sense to sum some unrelated values as long as they're unique.

 

0 Karma

Sathiya123
Explorer

For e.g.

Events are like below:

TIME  |  VM_NAME  |  VM_UNIT

2023-03-08 01:05  | MONGO_VM |  1

2023-03-08 01:10  | NODE_VM     |  2

2023-03-08 01:05  | MONGO_VM |  1

The splunk query should return SUM(VM_UNIT):

TIME  |  SUM(VM_UNIT)

2023-03-08 01:00  |  3

3rd event has to  ignored since its duplicate of 1st event. Hope its clear.

0 Karma

PickleRick
SplunkTrust
SplunkTrust

That's what I thought. More or less. In your example both events have the same time. I suppose in real data the timestamp can differ.

0 Karma

Sathiya123
Explorer

@PickleRick Thanks a ton! It worked!

Got my query, just putting it here so that it will be helpful to others.

<search>
| eval vm_unit=case(vmSize="Standard_F16s_v2",2,vmSize="Standard_F8s_v2",1,vmSize="Standard_F4s",0.5,vmSize="Standard_F2s_v2",0.25)
| bin _time span=1h
| stats values(vm_unit) as vm_unit values(location) as location by _time id
| timechart span=1h usenull=true sum(vm_unit) as vm_count by location
| fillnull value=0

0 Karma

woodcock
Esteemed Legend

Like this:

... | timechart span=1h sum(vm_unit) AS USED_VMS BY vm_name

0 Karma

Sathiya123
Explorer

Thanks, @woodcock  for your reply!

What I am expecting 

Sathiya123_0-1680847554023.png

But what I am getting 

Sathiya123_1-1680847650186.png

 

 

0 Karma

woodcock
Esteemed Legend

Don't use pictures; use text.  Your images are not showing.

0 Karma

PickleRick
SplunkTrust
SplunkTrust

There must be something with your browser. The pictures are OK.

Get Updates on the Splunk Community!

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

ICYMI - Check out the latest releases of Splunk Edge Processor

Splunk is pleased to announce the latest enhancements to Splunk Edge Processor.  HEC Receiver authorization ...

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...