I'm looking at a very large set of data that separates transactions by product. I've performed some relatively straightforward stats commands on the aggregate data set, but now I'd like to extract a similar set of data as it relates to unique accounts.
For example, I want to look at stats related to products, but across unique accounts rather than accounts as a total to give insight into how specific accounts behave. For the purposes of this, let y be product and x be accountHash.
In a splunk query, I could extract the distinct account numbers from the data set by product doing the following:
index=index source=source sourcetype=sourcetype product=productValue
| fields product, accountHash
| lookup productCode AS product OUTPUT productDescription
| stats DC(accountHash) as uniqueAccounts by productDescription
What if I wanted to look at say, stats count as Volume, avg(transactionValue), etc. across unique accounts? Can I then aggregate the total by productDescription? I know that I could do something like this:
index=index source=source sourcetype=sourcetype product=productValue
| fields product, accountHash
| lookup productCode AS product OUTPUT productDescription
| stats count as Volume, avg(transactionValue) as avgTranValue by accountHash, product
But this would give me a dataset with too many rows to be meaningful. Is it possible to create statistics by unique accountHash values, and then tie those to a product? I don't need to see the individual accountValues, but I'd like to compare statistics across the aggregate total, which would likely skew the statistics towards accounts that use their accounts the most.
Could I do something like
| stats by accountHash
And then another stats command that gives me product results across distinct accounts? If the question isn't clear, let me know and I will try to rephrase.
I am not sure I get the requirement, but you can do stats after stats, although you will only retain the fields after the first stats from the aggregations and by clause.
Often using stats values(X) as X by Y and then doing subsequent stats with the multivalue (X)
Alternatively, you can use eventstats to calculate stats by any set of splits - with eventstats it simply enriches the event data and you don't lose any of the other fields, so you can do things like
| eventsats count as Volume avg(transactionValue) by accountHash
| stats ... by product
Can you be a bit clearer about what your desired output is, perhaps with a pseudo example?