Splunk Search

How to get percentages of value pairs

behudelson
Path Finder

I have a very large set of retail data. The significant fields for this query are store_no, transaction_amt, zip, ethnicity.

I am attempting to get a report for a single store the percentage of sales amounts by ethnicity for each zip code.

So what I am aiming for is a result set like this:

                        46208      46219       46222
"A: African American"   .25         .23         .40
"B: Asian"              .06         .10         .14
"C: Hispanic"           .21         .22         .35
"D: White"              .48         .45         .11

So far I have been able to get the total sales for each ethnicity at a store, but I don't know how to report out on the zip code values.

Here is what I have so far:

source="hhg-butler-data.csv" host="jw0mt32" sourcetype="hhg"  
| where (store_no = 243) 
| search ethnicity!="E: Unknown"
| stats 
  sum(transaction_amt) as TotalSales 
  sum(eval(if(ethnicity="A: African American", transaction_amt, null()))) as TotalAfricanAmericanSales
  sum(eval(if(ethnicity="B: Asian", transaction_amt, null()))) as TotalAsianSales
  sum(eval(if(ethnicity="C: Hispanic", transaction_amt, null()))) as TotalHispanicSales
  sum(eval(if(ethnicity="D: White", transaction_amt, null()))) as TotalWhiteSales  
  count as TotalCount
  dc(zip) as dcZip
  by store_no

| eval PercentAfricanAmericanSales=round(TotalAfricanAmericanSales/TotalSales, 6) * 100
| eval PercentAsianSales=round(TotalAsianSales/TotalSales, 6) * 100 
| eval PercentHispanicSales=round(TotalHispanicSales/TotalSales, 6) * 100 
| eval PercentWhiteSales=round(TotalWhiteSales/TotalSales, 6) * 100

| table TotalSales, TotalAfricanAmericanSales, PercentAfricanAmericanSales, TotalAsianSales, PercentAsianSales, TotalHispanicSales, PercentHispanicSales, TotalWhiteSales, PercentWhiteSales, TotalCount, dcZip

Thank you and please let me know if I can clarify what I am asking.

Tags (1)
0 Karma
1 Solution

@behudelson,

1) Your first should where and search should be merged in the base search for filtering required results upfront.
2) If your final output requires transaction amount by zip and ethnicity, store number is not required for stats.

Since we need statistics by zip code and then output by ethnicity, I have used eventstats and foreach to calculate percent by zip and then did a transpose to reverse axis in the end.

 source="hhg-butler-data.csv" host="jw0mt32" sourcetype="hhg" store_no="243" ethnicity!="E: Unknown"
|  stats sum(transaction_amt) as transaction_amt by zip ethnicity
|  eventstats sum(transaction_amt) as Total by zip
|  chart values(Total) as Total sum(transaction_amt) as transaction_amt by zip ethnicity
|  rename "Total: A: African American" as Total
|  fields - "Total: *"
|  foreach "transaction_amt: *" 
    [eval "perc: <<MATCHSTR>>"=round(('<<FIELD>>'/Total)*100,1)]
|  fields zip perc*
|  rename "perc: *" as *
|  transpose header_field="zip" column_name="ethnicity"

Following is a run anywhere search to test out (makeresults with append command mock data as per the question):

|  makeresults
|  eval ethnicity="A: African American", zip=46208, transaction_amt=2000
|  append 
    [|  makeresults
|  eval ethnicity="A: African American", zip=46219, transaction_amt=1500]
|  append 
    [|  makeresults
|  eval ethnicity="A: African American", zip=46222, transaction_amt=2500]
|  append 
    [|  makeresults
|  eval ethnicity="B: Asian", zip=46208, transaction_amt=1100]
|  append 
    [|  makeresults
|  eval ethnicity="B: Asian", zip=46219, transaction_amt=1300]
|  append 
    [|  makeresults
|  eval ethnicity="B: Asian", zip=46222, transaction_amt=2500]
|  append 
    [|  makeresults
|  eval ethnicity="C: Hispanic", zip=46208, transaction_amt=1200]
|  append 
    [|  makeresults
|  eval ethnicity="C: Hispanic", zip=46219, transaction_amt=2000]
|  append 
    [|  makeresults
|  eval ethnicity="C: Hispanic", zip=46222, transaction_amt=2300]
|  append 
    [|  makeresults
|  eval ethnicity="D: White", zip=46208, transaction_amt=1800]
|  append 
    [|  makeresults
|  eval ethnicity="D: White", zip=46219, transaction_amt=1570]
|  append 
    [|  makeresults
|  eval ethnicity="D: White", zip=46222, transaction_amt=1350]
|  stats sum(transaction_amt) as transaction_amt by zip ethnicity
|  eventstats sum(transaction_amt) as Total by zip
|  chart values(Total) as Total sum(transaction_amt) as transaction_amt by zip ethnicity
|  rename "Total: A: African American" as Total
|  fields - "Total: *"
|  foreach "transaction_amt: *" 
    [eval "perc: <<MATCHSTR>>"=round(('<<FIELD>>'/Total)*100,1)]
|  fields zip perc*
|  rename "perc: *" as *
|  transpose header_field="zip" column_name="ethnicity"
____________________________________________
| makeresults | eval message= "Happy Splunking!!!"

View solution in original post

@behudelson,

1) Your first should where and search should be merged in the base search for filtering required results upfront.
2) If your final output requires transaction amount by zip and ethnicity, store number is not required for stats.

Since we need statistics by zip code and then output by ethnicity, I have used eventstats and foreach to calculate percent by zip and then did a transpose to reverse axis in the end.

 source="hhg-butler-data.csv" host="jw0mt32" sourcetype="hhg" store_no="243" ethnicity!="E: Unknown"
|  stats sum(transaction_amt) as transaction_amt by zip ethnicity
|  eventstats sum(transaction_amt) as Total by zip
|  chart values(Total) as Total sum(transaction_amt) as transaction_amt by zip ethnicity
|  rename "Total: A: African American" as Total
|  fields - "Total: *"
|  foreach "transaction_amt: *" 
    [eval "perc: <<MATCHSTR>>"=round(('<<FIELD>>'/Total)*100,1)]
|  fields zip perc*
|  rename "perc: *" as *
|  transpose header_field="zip" column_name="ethnicity"

Following is a run anywhere search to test out (makeresults with append command mock data as per the question):

|  makeresults
|  eval ethnicity="A: African American", zip=46208, transaction_amt=2000
|  append 
    [|  makeresults
|  eval ethnicity="A: African American", zip=46219, transaction_amt=1500]
|  append 
    [|  makeresults
|  eval ethnicity="A: African American", zip=46222, transaction_amt=2500]
|  append 
    [|  makeresults
|  eval ethnicity="B: Asian", zip=46208, transaction_amt=1100]
|  append 
    [|  makeresults
|  eval ethnicity="B: Asian", zip=46219, transaction_amt=1300]
|  append 
    [|  makeresults
|  eval ethnicity="B: Asian", zip=46222, transaction_amt=2500]
|  append 
    [|  makeresults
|  eval ethnicity="C: Hispanic", zip=46208, transaction_amt=1200]
|  append 
    [|  makeresults
|  eval ethnicity="C: Hispanic", zip=46219, transaction_amt=2000]
|  append 
    [|  makeresults
|  eval ethnicity="C: Hispanic", zip=46222, transaction_amt=2300]
|  append 
    [|  makeresults
|  eval ethnicity="D: White", zip=46208, transaction_amt=1800]
|  append 
    [|  makeresults
|  eval ethnicity="D: White", zip=46219, transaction_amt=1570]
|  append 
    [|  makeresults
|  eval ethnicity="D: White", zip=46222, transaction_amt=1350]
|  stats sum(transaction_amt) as transaction_amt by zip ethnicity
|  eventstats sum(transaction_amt) as Total by zip
|  chart values(Total) as Total sum(transaction_amt) as transaction_amt by zip ethnicity
|  rename "Total: A: African American" as Total
|  fields - "Total: *"
|  foreach "transaction_amt: *" 
    [eval "perc: <<MATCHSTR>>"=round(('<<FIELD>>'/Total)*100,1)]
|  fields zip perc*
|  rename "perc: *" as *
|  transpose header_field="zip" column_name="ethnicity"
____________________________________________
| makeresults | eval message= "Happy Splunking!!!"

behudelson
Path Finder

Hi @niketnilay, I will study this and follow up if I have questions. Thank you for such a quick and detailed response - very cool!

0 Karma

@behudelson, second search is a run anywhere search. From the | stats command onward you can try out one command at a time to see and understand how each one of them behave.

You should also refer to Splunk Docs for each command used for reference. For example, transpose command has a default limit of 5. You will definitely have more zip codes than that, so you may have to use transpose with required limit.

 |  transpose 5 header_field="zip" column_name="ethnicity"

http://docs.splunk.com/Documentation/Splunk/latest/SearchReference/Transpose

Do let us know if you have any questions and accept the Answer once your issue is resolved 🙂

____________________________________________
| makeresults | eval message= "Happy Splunking!!!"
0 Karma
Get Updates on the Splunk Community!

Introducing Ingest Actions: Filter, Mask, Route, Repeat

WATCH NOW Ingest Actions (IA) is the best new way to easily filter, mask and route your data in Splunk® ...

Splunk Forwarders and Forced Time Based Load Balancing

Splunk customers use universal forwarders to collect and send data to Splunk. A universal forwarder can send ...

NEW! Log Views in Splunk Observability Dashboards Gives Context From a Single Page

Today, Splunk Observability releases log views, a new feature for users to add their logs data from Splunk Log ...