How to use stats to find extract values by y as th...

ft_kd02 · ‎02-23-2023

I'm looking at a very large set of data that separates transactions by product. I've performed some relatively straightforward stats commands on the aggregate data set, but now I'd like to extract a similar set of data as it relates to unique accounts.

For example, I want to look at stats related to products, but across unique accounts rather than accounts as a total to give insight into how specific accounts behave. For the purposes of this, let y be product and x be accountHash.

In a splunk query, I could extract the distinct account numbers from the data set by product doing the following:

index=index source=source sourcetype=sourcetype product=productValue
| fields product, accountHash
| lookup productCode AS product OUTPUT productDescription
| stats DC(accountHash) as uniqueAccounts by productDescription

What if I wanted to look at say, stats count as Volume, avg(transactionValue), etc. across unique accounts? Can I then aggregate the total by productDescription? I know that I could do something like this:

index=index source=source sourcetype=sourcetype product=productValue
| fields product, accountHash
| lookup productCode AS product OUTPUT productDescription
| stats count as Volume, avg(transactionValue) as avgTranValue by accountHash, product

But this would give me a dataset with too many rows to be meaningful. Is it possible to create statistics by unique accountHash values, and then tie those to a product? I don't need to see the individual accountValues, but I'd like to compare statistics across the aggregate total, which would likely skew the statistics towards accounts that use their accounts the most.

Could I do something like

| stats by accountHash

And then another stats command that gives me product results across distinct accounts? If the question isn't clear, let me know and I will try to rephrase.

bowesmana · ‎02-23-2023

I am not sure I get the requirement, but you can do stats after stats, although you will only retain the fields after the first stats from the aggregations and by clause.

Often using stats values(X) as X by Y and then doing subsequent stats with the multivalue (X)

Alternatively, you can use eventstats to calculate stats by any set of splits - with eventstats it simply enriches the event data and you don't lose any of the other fields, so you can do things like

| eventsats count as Volume avg(transactionValue) by accountHash
| stats ... by product

Can you be a bit clearer about what your desired output is, perhaps with a pseudo example?

How to use stats to find extract values by y as they correspond to distinct values of x?

stats

Building Reliable Asset and Identity Frameworks in Splunk ES

Cloud Monitoring Console - Unlocking Greater Visibility in SVC Usage Reporting

Automatic Discovery Part 3: Practical Use Cases

Are you a member of the Splunk Community?

How to use stats to find extract values by y as they correspond to distinct values of x?

stats

Building Reliable Asset and Identity Frameworks in Splunk ES

Cloud Monitoring Console - Unlocking Greater Visibility in SVC Usage Reporting

Automatic Discovery Part 3: Practical Use Cases