How can I obtain the percentage of zero values in a lookup table? I have tried the following command without success:
| inputlookup input.csv | fieldsummary | fields field | map search="| inputlookup input.csv | stats count(eval($$field$$ == 0)) as count_zeros | eval field=$$field$$" | eval zeros(%) = round((count_zeros*100)/10000,2) | fields zeros(%)
A sample of the dataset "input.csv" is the following:
pickup_community_area fare trip_start_month trip_start_hour trip_start_day trip_start_timestamp pickup_latitude pickup_longitude dropoff_latitude dropoff_longitude trip_miles pickup_census_tract dropoff_census_tract payment_type company trip_seconds dropoff_community_area tips
60 27.05 10 2 3 1380593700 41.836.150.155 -87.648.787.952 12.6 Cash Taxi Affiliation Services 1380 0.0
10 5.85 10 1 2 1382319000 41.985.015.101 -87.804.532.006 0.0 Cash Taxi Affiliation Services 180 0.0
14 16.65 5 7 5 1369897200 41.968.069 -87.721.559.063 0.0 Cash Dispatch Taxi Affiliation 1080 0.0
13 16.45 11 12 3 1446554700 41.983.636.307 -87.723.583.185 6.9 Cash 780 0.0
16 32.05 12 1 1 1417916700 41.953.582.125 -8.772.345.239 15.4 Cash 1200 0.0
30 38.45 10 10 5 1444301100 41.839.086.906 -87.714.003.807 14.6 Cash 2580 0.0
11 14.65 1 1 3 1358213400 41.978.829.526 -87.771.166.703 5.81 Cash 1080 0.0
33 3.25 5 17 1 1368985500 41.849.246.754 -87.624.135.298 0.0 Cash Taxi Affiliation Services 0 0.0
19 47.65 6 15 4 1372258800 41.927.260.956 -87.765.501.609 0.0 Cash Taxi Affiliation Services 3480 0.0
There are 18 columns and I am only obtaining results for 10 columns of the whole dataset.
zeros(%)
0.00
0.00
0.00
0.00
0.00
0.17
0.00
0.00
0.00
0.00
do you want results for each row, each column or all together? Here is an example using foreach that should get you the total percent across the entire lookup.
| inputlookup input.csv
| foreach *
[ eval count = coalesce(count+1,1), zero_count = if(<<FIELD>> = 0,coalesce(zero_count+1,1),zero_count)]
| stats sum(count) as total, sum(zero_count) as zero_total
| eval perc_zero = round(100*(zero_total/total),2)
For each is processed for every row like other commands, but it then iterates through each field in a row based on the filter. In this case, all fields (*). The new count field i created just counts the number of fields. Then the new zero_count field counts the number of fields that equal 0 - the <> notation references the name of the field we are are on while iterating through them all.
After that, i'm just sum'ing those up and doing the percentage math. But you could instead just do the math at each row, or sum by the fields you're interested in, etc.
Like this:
| inputlookup input.csv
| stats [|inputlookup input.csv
| head 1
| eval _search=" "
| foreach * [ eval _search = _search . " count(<<FIELD>>) AS <<FIELD>>_count count(eval(<<FIELD>>==0)) AS <<FIELD>>_zeroes" ]
| rename _search AS search
| return $search ]
| foreach *_count [ eval <<MATCH>>_pct = round((100 * <<MATCH>>_zeroes / <<MATCH>>_count), 2) ]
There seems to be some mistake. I get the following message when I execute it:
Error in 'eval' command: The expression is malformed. An unexpected character is reached at '<<MATCH>>_zeroes / <<MATCH>>_count), 2)'.
do you want results for each row, each column or all together? Here is an example using foreach that should get you the total percent across the entire lookup.
| inputlookup input.csv
| foreach *
[ eval count = coalesce(count+1,1), zero_count = if(<<FIELD>> = 0,coalesce(zero_count+1,1),zero_count)]
| stats sum(count) as total, sum(zero_count) as zero_total
| eval perc_zero = round(100*(zero_total/total),2)
For each is processed for every row like other commands, but it then iterates through each field in a row based on the filter. In this case, all fields (*). The new count field i created just counts the number of fields. Then the new zero_count field counts the number of fields that equal 0 - the <> notation references the name of the field we are are on while iterating through them all.
After that, i'm just sum'ing those up and doing the percentage math. But you could instead just do the math at each row, or sum by the fields you're interested in, etc.
As you said, with this command you can obtain the total percentage across the entire lookup. I am interested in obtaining results for each column. What will be the command for this calculation?
I'm sure woodcock's answer works too, but here is what I came up with
| inputlookup input.csv
| foreach * [ eval <<FIELD>> = if(<<FIELD>>=0,1,0)]
| untable blah column count
| stats count as total sum(count) as zero by column
| addcoltotals labelfield="column" label="Total"
| eval perc_zero = round(100*zero/total,2)
this time we set the field to 1 if it's 0 and 0 if it's non-zero (maybe a little confusing). Then we use untable to get the field names as values into one field called "column", and then we just count to get the total and sum to get the number that are zero and group by the column.
I added the totals at the end too, because why not. I didn't verify that this was accurate, but i think it should work.
It works great. Thank you.