We have a sample report that we generate everyday -it contains two fields:
i) OrgID (which is duplicated in the events from time to time)
ii) OrganizationPatientCount (each OrgID has one of these)
We are trying to graph the second attribute over a 30 day timeframe like this:
sourcetype="metrics" OrgID=* | dedup OrgId | bucket _time span=1d | stats sum(OrganizationPatientCount ) by _time
Which seems to be working except that it gives me JUST TODAY's OrganizationPatientCount.
We need to use stats in such a way that it avoids duplicates AND it can give me patient count for everyday in that 30 day time period.
I tried looking up how bucketing actually works when used WITH stats and I read through this:
http://docs.splunk.com/Documentation/Splunk/latest/SearchReference/bucket
but that wasnt particularly helpful.
Note that if I dont specify buckets explicitly at all, stats() used with sum() in this way gives me results for EVERY DAY in a 30 day time-interval - except that it gives me a completely incorrect count.
Any ideas ?
Seems to me you should dedup only after bucketing by _time
. I don't know really what your data means, but seems like something similar to:
sourcetype="metrics" OrgID=* | bucket _time span=1d | dedup OrgID _time | stats sum(OrganizationPatientCount ) by _time
or
sourcetype=metrics OrgID=* | bucket _time span=1d | stats latest(OrganizationPatientCount) as lastOPC by _time,OrgID | stats sum(OPC) by _time
By the way, field names are case-sensitive, and maybe you just had a typo, but dedup OrgId
won't work if your field is named OrgID
.
So it turns out that a variation of your query worked out for me !!
sourcetype="metrics" OrgID=* | bucket span=1d _time | dedup OrgID _time | stats sum(OrganizationPatientCount) by _time
Life Lesson learnt : the best things in life are free... and elegant 🙂
Seems to me you should dedup only after bucketing by _time
. I don't know really what your data means, but seems like something similar to:
sourcetype="metrics" OrgID=* | bucket _time span=1d | dedup OrgID _time | stats sum(OrganizationPatientCount ) by _time
or
sourcetype=metrics OrgID=* | bucket _time span=1d | stats latest(OrganizationPatientCount) as lastOPC by _time,OrgID | stats sum(OPC) by _time
By the way, field names are case-sensitive, and maybe you just had a typo, but dedup OrgId
won't work if your field is named OrgID
.
In the first example, you need to remove the "by" in the dedup, i.e. "dedup OrgID _time"
I tried option #1 and it came back to me with no results (oddly enough)
I tried the following version of option #2 (I made a few changes for the typos)
sourcetype=metrics OrgID=* | bucket _time span=1d | stats latest(OrganizationPatientCount) as lastOrganizationPatientCount by _time,OrgID | stats sum(lastOrganizationPatientCount ) by _time
but that did not work out either - there was no data. I am assuming #2 is REALLY what you wanted me to test
Also with option#2, there was no dedup..
Also, I examined all content from my queries for case-sensitivity and everything checks out