Splunk Search

Sample search to look over data using stats, dedup and bucketing

asarolkar
Builder

We have a sample report that we generate everyday -it contains two fields:

i) OrgID (which is duplicated in the events from time to time)

  • and

ii) OrganizationPatientCount (each OrgID has one of these)






We are trying to graph the second attribute over a 30 day timeframe like this:

sourcetype="metrics" OrgID=* | dedup OrgId | bucket _time span=1d | stats sum(OrganizationPatientCount ) by _time

Which seems to be working except that it gives me JUST TODAY's OrganizationPatientCount.

We need to use stats in such a way that it avoids duplicates AND it can give me patient count for everyday in that 30 day time period.




I tried looking up how bucketing actually works when used WITH stats and I read through this:

http://docs.splunk.com/Documentation/Splunk/latest/SearchReference/bucket

but that wasnt particularly helpful.




Note that if I dont specify buckets explicitly at all, stats() used with sum() in this way gives me results for EVERY DAY in a 30 day time-interval - except that it gives me a completely incorrect count.

Any ideas ?

Tags (4)
0 Karma
1 Solution

gkanapathy
Splunk Employee
Splunk Employee

Seems to me you should dedup only after bucketing by _time. I don't know really what your data means, but seems like something similar to:

sourcetype="metrics" OrgID=* | bucket _time span=1d | dedup OrgID  _time | stats sum(OrganizationPatientCount ) by _time

or

sourcetype=metrics OrgID=* | bucket _time span=1d | stats latest(OrganizationPatientCount) as lastOPC by _time,OrgID | stats sum(OPC) by _time

By the way, field names are case-sensitive, and maybe you just had a typo, but dedup OrgId won't work if your field is named OrgID.

View solution in original post

asarolkar
Builder

So it turns out that a variation of your query worked out for me !!

sourcetype="metrics" OrgID=* | bucket span=1d _time | dedup OrgID _time | stats sum(OrganizationPatientCount) by _time

Life Lesson learnt : the best things in life are free... and elegant 🙂

0 Karma

gkanapathy
Splunk Employee
Splunk Employee

Seems to me you should dedup only after bucketing by _time. I don't know really what your data means, but seems like something similar to:

sourcetype="metrics" OrgID=* | bucket _time span=1d | dedup OrgID  _time | stats sum(OrganizationPatientCount ) by _time

or

sourcetype=metrics OrgID=* | bucket _time span=1d | stats latest(OrganizationPatientCount) as lastOPC by _time,OrgID | stats sum(OPC) by _time

By the way, field names are case-sensitive, and maybe you just had a typo, but dedup OrgId won't work if your field is named OrgID.

helge
Builder

In the first example, you need to remove the "by" in the dedup, i.e. "dedup OrgID _time"

0 Karma

asarolkar
Builder

I tried option #1 and it came back to me with no results (oddly enough)

I tried the following version of option #2 (I made a few changes for the typos)

sourcetype=metrics OrgID=* | bucket _time span=1d | stats latest(OrganizationPatientCount) as lastOrganizationPatientCount by _time,OrgID | stats sum(lastOrganizationPatientCount ) by _time

but that did not work out either - there was no data. I am assuming #2 is REALLY what you wanted me to test

Also with option#2, there was no dedup..

Also, I examined all content from my queries for case-sensitivity and everything checks out

0 Karma
Get Updates on the Splunk Community!

Webinar Recap | Revolutionizing IT Operations: The Transformative Power of AI and ML ...

The Transformative Power of AI and ML in Enhancing Observability   In the realm of IT operations, the ...

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

ICYMI - Check out the latest releases of Splunk Edge Processor

Splunk is pleased to announce the latest enhancements to Splunk Edge Processor.  HEC Receiver authorization ...