Solved: Sample search to look over data using stats, dedup...

asarolkar · ‎07-04-2012

We have a sample report that we generate everyday -it contains two fields:

i) OrgID (which is duplicated in the events from time to time)

and

ii) OrganizationPatientCount (each OrgID has one of these)

We are trying to graph the second attribute over a 30 day timeframe like this:

sourcetype="metrics" OrgID=* | dedup OrgId | bucket _time span=1d | stats sum(OrganizationPatientCount ) by _time

Which seems to be working except that it gives me JUST TODAY's OrganizationPatientCount.

We need to use stats in such a way that it avoids duplicates AND it can give me patient count for everyday in that 30 day time period.

I tried looking up how bucketing actually works when used WITH stats and I read through this:

http://docs.splunk.com/Documentation/Splunk/latest/SearchReference/bucket

but that wasnt particularly helpful.

Note that if I dont specify buckets explicitly at all, stats() used with sum() in this way gives me results for EVERY DAY in a 30 day time-interval - except that it gives me a completely incorrect count.

Any ideas ?

gkanapathy · ‎07-04-2012

Seems to me you should dedup only after bucketing by _time. I don't know really what your data means, but seems like something similar to:

sourcetype="metrics" OrgID=* | bucket _time span=1d | dedup OrgID  _time | stats sum(OrganizationPatientCount ) by _time

or

sourcetype=metrics OrgID=* | bucket _time span=1d | stats latest(OrganizationPatientCount) as lastOPC by _time,OrgID | stats sum(OPC) by _time

By the way, field names are case-sensitive, and maybe you just had a typo, but dedup OrgId won't work if your field is named OrgID.

View solution in original post

asarolkar · ‎07-05-2012

So it turns out that a variation of your query worked out for me !!

sourcetype="metrics" OrgID=* | bucket span=1d _time | dedup OrgID _time | stats sum(OrganizationPatientCount) by _time

Life Lesson learnt : the best things in life are free... and elegant 🙂

gkanapathy · ‎07-04-2012

Seems to me you should dedup only after bucketing by _time. I don't know really what your data means, but seems like something similar to:

sourcetype="metrics" OrgID=* | bucket _time span=1d | dedup OrgID  _time | stats sum(OrganizationPatientCount ) by _time

or

sourcetype=metrics OrgID=* | bucket _time span=1d | stats latest(OrganizationPatientCount) as lastOPC by _time,OrgID | stats sum(OPC) by _time

By the way, field names are case-sensitive, and maybe you just had a typo, but dedup OrgId won't work if your field is named OrgID.

helge · ‎12-08-2012

In the first example, you need to remove the "by" in the dedup, i.e. "dedup OrgID _time"

asarolkar · ‎07-05-2012

I tried option #1 and it came back to me with no results (oddly enough)

I tried the following version of option #2 (I made a few changes for the typos)

sourcetype=metrics OrgID=* | bucket _time span=1d | stats latest(OrganizationPatientCount) as lastOrganizationPatientCount by _time,OrgID | stats sum(lastOrganizationPatientCount ) by _time

but that did not work out either - there was no data. I am assuming #2 is REALLY what you wanted me to test

Also with option#2, there was no dedup..

Also, I examined all content from my queries for case-sensitivity and everything checks out

Sample search to look over data using stats, dedup and bucketing

Splunk Observability for AI

[Puzzles] Solve, Learn, Repeat: Dereferencing XML to Fixed-length events

Stay Connected: Your Guide to December Tech Talks, Office Hours, and Webinars!

Are you a member of the Splunk Community?

Sample search to look over data using stats, dedup and bucketing

Splunk Observability for AI

[Puzzles] Solve, Learn, Repeat: Dereferencing XML to Fixed-length events

Stay Connected: Your Guide to December Tech Talks, Office Hours, and Webinars!