Splunk Search

How to use stats to find extract values by y as they correspond to distinct values of x?

ft_kd02
Path Finder

I'm looking at a very large set of data that separates transactions by product. I've performed some relatively straightforward stats commands on the aggregate data set, but now I'd like to extract a similar set of data as it relates to unique accounts. 

For example, I want to look at stats related to products, but across unique accounts rather than accounts as a total to give insight into how specific accounts behave. For the purposes of this, let y be product and x be accountHash. 

In a splunk query, I could extract the distinct account numbers from the data set by product doing the following: 

index=index source=source sourcetype=sourcetype product=productValue 
| fields product, accountHash
| lookup productCode AS product OUTPUT productDescription
| stats  DC(accountHash) as uniqueAccounts by productDescription

What if I wanted to look at say, stats count as Volume, avg(transactionValue), etc. across unique accounts? Can I then aggregate the total by productDescription? I know that I could do something like this:

index=index source=source sourcetype=sourcetype product=productValue 
| fields product, accountHash
| lookup productCode AS product OUTPUT productDescription
| stats  count as Volume, avg(transactionValue) as avgTranValue by accountHash, product

But this would give me a dataset with too many rows to be meaningful. Is it possible to create statistics by unique accountHash values, and then tie those to a product? I don't need to see the individual accountValues, but I'd like to compare statistics across the aggregate total, which would likely skew the statistics towards accounts that use their accounts the most. 

Could I do something like 

| stats by accountHash

And then another stats command that gives me product results across distinct accounts? If the question isn't clear, let me know and I will try to rephrase.

 

Labels (1)
0 Karma

bowesmana
SplunkTrust
SplunkTrust

I am not sure I get the requirement, but you can do stats after stats, although you will only retain the fields after the first stats from the aggregations and by clause.

Often using stats values(X) as X by Y and then doing subsequent stats with the multivalue (X)

Alternatively, you can use eventstats to calculate stats by any set of splits - with eventstats it simply enriches the event data and you don't lose any of the other fields, so you can do things like

| eventsats count as Volume avg(transactionValue) by accountHash
| stats ... by product

Can you be a bit clearer about what your desired output is, perhaps with a pseudo example?

0 Karma
Career Survey
First 500 qualified respondents will receive a $20 gift card! Tell us about your professional Splunk journey.

Can’t make it to .conf25? Join us online!

Get Updates on the Splunk Community!

Can’t Make It to Boston? Stream .conf25 and Learn with Haya Husain

Boston may be buzzing this September with Splunk University and .conf25, but you don’t have to pack a bag to ...

Splunk Lantern’s Guide to The Most Popular .conf25 Sessions

Splunk Lantern is a Splunk customer success center that provides advice from Splunk experts on valuable data ...

Unlock What’s Next: The Splunk Cloud Platform at .conf25

In just a few days, Boston will be buzzing as the Splunk team and thousands of community members come together ...