Splunk Search

How to use stats to find extract values by y as they correspond to distinct values of x?

ft_kd02
Path Finder

I'm looking at a very large set of data that separates transactions by product. I've performed some relatively straightforward stats commands on the aggregate data set, but now I'd like to extract a similar set of data as it relates to unique accounts. 

For example, I want to look at stats related to products, but across unique accounts rather than accounts as a total to give insight into how specific accounts behave. For the purposes of this, let y be product and x be accountHash. 

In a splunk query, I could extract the distinct account numbers from the data set by product doing the following: 

index=index source=source sourcetype=sourcetype product=productValue 
| fields product, accountHash
| lookup productCode AS product OUTPUT productDescription
| stats  DC(accountHash) as uniqueAccounts by productDescription

What if I wanted to look at say, stats count as Volume, avg(transactionValue), etc. across unique accounts? Can I then aggregate the total by productDescription? I know that I could do something like this:

index=index source=source sourcetype=sourcetype product=productValue 
| fields product, accountHash
| lookup productCode AS product OUTPUT productDescription
| stats  count as Volume, avg(transactionValue) as avgTranValue by accountHash, product

But this would give me a dataset with too many rows to be meaningful. Is it possible to create statistics by unique accountHash values, and then tie those to a product? I don't need to see the individual accountValues, but I'd like to compare statistics across the aggregate total, which would likely skew the statistics towards accounts that use their accounts the most. 

Could I do something like 

| stats by accountHash

And then another stats command that gives me product results across distinct accounts? If the question isn't clear, let me know and I will try to rephrase.

 

Labels (1)
0 Karma

bowesmana
SplunkTrust
SplunkTrust

I am not sure I get the requirement, but you can do stats after stats, although you will only retain the fields after the first stats from the aggregations and by clause.

Often using stats values(X) as X by Y and then doing subsequent stats with the multivalue (X)

Alternatively, you can use eventstats to calculate stats by any set of splits - with eventstats it simply enriches the event data and you don't lose any of the other fields, so you can do things like

| eventsats count as Volume avg(transactionValue) by accountHash
| stats ... by product

Can you be a bit clearer about what your desired output is, perhaps with a pseudo example?

0 Karma
Career Survey
First 500 qualified respondents will receive a $20 gift card! Tell us about your professional Splunk journey.
Get Updates on the Splunk Community!

Beyond Detection: How Splunk and Cisco Integrated Security Platforms Transform ...

Financial services organizations face an impossible equation: maintain 99.9% uptime for mission-critical ...

Customer success is front and center at .conf25

Hi Splunkers, If you are not able to be at .conf25 in person, you can still learn about all the latest news ...

.conf25 Global Broadcast: Don’t Miss a Moment

Hello Splunkers, .conf25 is only a click away.  Not able to make it to .conf25 in person? No worries, you can ...