Splunk Search

Sample search to look over data using stats, dedup and bucketing

asarolkar
Builder

We have a sample report that we generate everyday -it contains two fields:

i) OrgID (which is duplicated in the events from time to time)

  • and

ii) OrganizationPatientCount (each OrgID has one of these)






We are trying to graph the second attribute over a 30 day timeframe like this:

sourcetype="metrics" OrgID=* | dedup OrgId | bucket _time span=1d | stats sum(OrganizationPatientCount ) by _time

Which seems to be working except that it gives me JUST TODAY's OrganizationPatientCount.

We need to use stats in such a way that it avoids duplicates AND it can give me patient count for everyday in that 30 day time period.




I tried looking up how bucketing actually works when used WITH stats and I read through this:

http://docs.splunk.com/Documentation/Splunk/latest/SearchReference/bucket

but that wasnt particularly helpful.




Note that if I dont specify buckets explicitly at all, stats() used with sum() in this way gives me results for EVERY DAY in a 30 day time-interval - except that it gives me a completely incorrect count.

Any ideas ?

Tags (4)
0 Karma
1 Solution

gkanapathy
Splunk Employee
Splunk Employee

Seems to me you should dedup only after bucketing by _time. I don't know really what your data means, but seems like something similar to:

sourcetype="metrics" OrgID=* | bucket _time span=1d | dedup OrgID  _time | stats sum(OrganizationPatientCount ) by _time

or

sourcetype=metrics OrgID=* | bucket _time span=1d | stats latest(OrganizationPatientCount) as lastOPC by _time,OrgID | stats sum(OPC) by _time

By the way, field names are case-sensitive, and maybe you just had a typo, but dedup OrgId won't work if your field is named OrgID.

View solution in original post

asarolkar
Builder

So it turns out that a variation of your query worked out for me !!

sourcetype="metrics" OrgID=* | bucket span=1d _time | dedup OrgID _time | stats sum(OrganizationPatientCount) by _time

Life Lesson learnt : the best things in life are free... and elegant 🙂

0 Karma

gkanapathy
Splunk Employee
Splunk Employee

Seems to me you should dedup only after bucketing by _time. I don't know really what your data means, but seems like something similar to:

sourcetype="metrics" OrgID=* | bucket _time span=1d | dedup OrgID  _time | stats sum(OrganizationPatientCount ) by _time

or

sourcetype=metrics OrgID=* | bucket _time span=1d | stats latest(OrganizationPatientCount) as lastOPC by _time,OrgID | stats sum(OPC) by _time

By the way, field names are case-sensitive, and maybe you just had a typo, but dedup OrgId won't work if your field is named OrgID.

helge
Builder

In the first example, you need to remove the "by" in the dedup, i.e. "dedup OrgID _time"

0 Karma

asarolkar
Builder

I tried option #1 and it came back to me with no results (oddly enough)

I tried the following version of option #2 (I made a few changes for the typos)

sourcetype=metrics OrgID=* | bucket _time span=1d | stats latest(OrganizationPatientCount) as lastOrganizationPatientCount by _time,OrgID | stats sum(lastOrganizationPatientCount ) by _time

but that did not work out either - there was no data. I am assuming #2 is REALLY what you wanted me to test

Also with option#2, there was no dedup..

Also, I examined all content from my queries for case-sensitivity and everything checks out

0 Karma
Get Updates on the Splunk Community!

Splunk Observability for AI

Don’t miss out on an exciting Tech Talk on Splunk Observability for AI! Discover how Splunk’s agentic AI ...

[Puzzles] Solve, Learn, Repeat: Dereferencing XML to Fixed-length events

This challenge was first posted on Slack #puzzles channelFor a previous puzzle, I needed a set of fixed-length ...

Stay Connected: Your Guide to December Tech Talks, Office Hours, and Webinars!

What are Community Office Hours? Community Office Hours is an interactive 60-minute Zoom series where ...