Hi all,
I am trying to get all transactions for defined day in the past that have amount larger than the average transaction amount for the month before.
Example:
Show all transactions from one day (18.06.2014) that are larger than average transaction in 5th month.
Data is in .xml file and defined fields are: Date, Time, CC(number), Amount
I know how to get average transaction for a month:
My search earliest=-1mon@mon latest=-0mon@mon | stats avg(Amount) AS Average
How do I get result for one day in the past while comparing it to the month before?
I guess it should be a subsearch… Please help.
To do this efficiently requires some work, and perhaps a lot of documentation reading. I'll cover the general approach and trust you to do your homework, etc.
To be able to do this efficiently, you should use one of Splunk's available data summarization tools. For the purpose of my explanation, I'm going to use a summary index. A summary index is an index that stores within pre-summarized data. Other tools, like accelerated data models, are also available.
Thinking about how to get your average for the previous month - what do you need to store? You could store a summary with the average by month, updating it once a month. This might be good enough. But, let's remember that an average (arithmetic mean) is defined as sum(xxx) / count(xxx). So, we could run a search like:
earliest=-1d@d latest=@d sourcetype=foo
| sitimechart span=1d count(Amount) as count, sum(Amount) as total
and configure it to store into a summary index on a daily basis. Now, for each day, we have a single row with the count and total amount for that day. We have the basis now for computing an average amount over any set of days in our history. Suppose we want to look at the prior month:
earliest=-1mon@mon latest=@mon index=mysummaryindex
| timechart span=1d count(Amount) as count, sum(Amount) as total
| eval average=total/count
Now you can use this in a subsearch, something like:
earliest=-1mon@mon latest=@mon index=mydata Amount >
[ search earliest=-1mon@mon latest=@mon index=mysummaryindex
| timechart span=1d count(Amount) as count, sum(Amount) as total
| eval average=total/count | rename average as search | format "" "" "" "" "" ""
]
This is perhaps orders of magnitude faster than looking back at the original data a whole month at a time. We need to pull at worst 31 records from the summary index and do a little math upon them. Because of how we set up the summary, we can also do filtering to get the average from "week days" or "Mondays", or "days that are not part of a holiday weekend", or "days where Saturn is in alignment with Libra"...
Some resources for summary indexing:
http://www.splunk.com/view/SP-CAAACZW
http://docs.splunk.com/Documentation/Splunk/6.2.2/Knowledge/Aboutsummaryindexing
Just as a comment, I hope the "CC" in your data isn't credit card #'s... Please don't Splunk credit card #'s 🙂