Splunk Search

Sample search to look over data using stats, dedup and bucketing

asarolkar
Builder

We have a sample report that we generate everyday -it contains two fields:

i) OrgID (which is duplicated in the events from time to time)

  • and

ii) OrganizationPatientCount (each OrgID has one of these)






We are trying to graph the second attribute over a 30 day timeframe like this:

sourcetype="metrics" OrgID=* | dedup OrgId | bucket _time span=1d | stats sum(OrganizationPatientCount ) by _time

Which seems to be working except that it gives me JUST TODAY's OrganizationPatientCount.

We need to use stats in such a way that it avoids duplicates AND it can give me patient count for everyday in that 30 day time period.




I tried looking up how bucketing actually works when used WITH stats and I read through this:

http://docs.splunk.com/Documentation/Splunk/latest/SearchReference/bucket

but that wasnt particularly helpful.




Note that if I dont specify buckets explicitly at all, stats() used with sum() in this way gives me results for EVERY DAY in a 30 day time-interval - except that it gives me a completely incorrect count.

Any ideas ?

Tags (4)
0 Karma
1 Solution

gkanapathy
Splunk Employee
Splunk Employee

Seems to me you should dedup only after bucketing by _time. I don't know really what your data means, but seems like something similar to:

sourcetype="metrics" OrgID=* | bucket _time span=1d | dedup OrgID  _time | stats sum(OrganizationPatientCount ) by _time

or

sourcetype=metrics OrgID=* | bucket _time span=1d | stats latest(OrganizationPatientCount) as lastOPC by _time,OrgID | stats sum(OPC) by _time

By the way, field names are case-sensitive, and maybe you just had a typo, but dedup OrgId won't work if your field is named OrgID.

View solution in original post

asarolkar
Builder

So it turns out that a variation of your query worked out for me !!

sourcetype="metrics" OrgID=* | bucket span=1d _time | dedup OrgID _time | stats sum(OrganizationPatientCount) by _time

Life Lesson learnt : the best things in life are free... and elegant 🙂

0 Karma

gkanapathy
Splunk Employee
Splunk Employee

Seems to me you should dedup only after bucketing by _time. I don't know really what your data means, but seems like something similar to:

sourcetype="metrics" OrgID=* | bucket _time span=1d | dedup OrgID  _time | stats sum(OrganizationPatientCount ) by _time

or

sourcetype=metrics OrgID=* | bucket _time span=1d | stats latest(OrganizationPatientCount) as lastOPC by _time,OrgID | stats sum(OPC) by _time

By the way, field names are case-sensitive, and maybe you just had a typo, but dedup OrgId won't work if your field is named OrgID.

helge
Builder

In the first example, you need to remove the "by" in the dedup, i.e. "dedup OrgID _time"

0 Karma

asarolkar
Builder

I tried option #1 and it came back to me with no results (oddly enough)

I tried the following version of option #2 (I made a few changes for the typos)

sourcetype=metrics OrgID=* | bucket _time span=1d | stats latest(OrganizationPatientCount) as lastOrganizationPatientCount by _time,OrgID | stats sum(lastOrganizationPatientCount ) by _time

but that did not work out either - there was no data. I am assuming #2 is REALLY what you wanted me to test

Also with option#2, there was no dedup..

Also, I examined all content from my queries for case-sensitivity and everything checks out

0 Karma
Get Updates on the Splunk Community!

Announcing Scheduled Export GA for Dashboard Studio

We're excited to announce the general availability of Scheduled Export for Dashboard Studio. Starting in ...

Extending Observability Content to Splunk Cloud

Watch Now!   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to leverage ...

More Control Over Your Monitoring Costs with Archived Metrics GA in US-AWS!

What if there was a way you could keep all the metrics data you need while saving on storage costs?This is now ...