Splunk Search

Percentage of zero values in lookup table

jacubero
Explorer

How can I obtain the percentage of zero values in a lookup table? I have tried the following command without success:

| inputlookup input.csv | fieldsummary | fields field | map search="| inputlookup input.csv | stats count(eval($$field$$ == 0)) as count_zeros | eval field=$$field$$" | eval zeros(%) = round((count_zeros*100)/10000,2) | fields zeros(%)

A sample of the dataset "input.csv" is the following:

pickup_community_area   fare    trip_start_month    trip_start_hour trip_start_day  trip_start_timestamp    pickup_latitude pickup_longitude    dropoff_latitude    dropoff_longitude   trip_miles  pickup_census_tract dropoff_census_tract    payment_type    company trip_seconds    dropoff_community_area  tips
60  27.05   10  2   3   1380593700  41.836.150.155  -87.648.787.952         12.6            Cash    Taxi Affiliation Services   1380        0.0
10  5.85    10  1   2   1382319000  41.985.015.101  -87.804.532.006         0.0         Cash    Taxi Affiliation Services   180     0.0
14  16.65   5   7   5   1369897200  41.968.069  -87.721.559.063         0.0         Cash    Dispatch Taxi Affiliation   1080        0.0
13  16.45   11  12  3   1446554700  41.983.636.307  -87.723.583.185         6.9         Cash        780     0.0
16  32.05   12  1   1   1417916700  41.953.582.125  -8.772.345.239          15.4            Cash        1200        0.0
30  38.45   10  10  5   1444301100  41.839.086.906  -87.714.003.807         14.6            Cash        2580        0.0
11  14.65   1   1   3   1358213400  41.978.829.526  -87.771.166.703         5.81            Cash        1080        0.0
33  3.25    5   17  1   1368985500  41.849.246.754  -87.624.135.298         0.0         Cash    Taxi Affiliation Services   0       0.0
19  47.65   6   15  4   1372258800  41.927.260.956  -87.765.501.609         0.0         Cash    Taxi Affiliation Services   3480        0.0

There are 18 columns and I am only obtaining results for 10 columns of the whole dataset.

zeros(%)
0.00
0.00
0.00
0.00
0.00
0.17
0.00
0.00
0.00
0.00
0 Karma
1 Solution

maciep
Champion

do you want results for each row, each column or all together? Here is an example using foreach that should get you the total percent across the entire lookup.

| inputlookup input.csv
| foreach * 
    [ eval count = coalesce(count+1,1), zero_count = if(<<FIELD>> = 0,coalesce(zero_count+1,1),zero_count)]
    | stats sum(count) as total, sum(zero_count) as zero_total
| eval perc_zero = round(100*(zero_total/total),2)

For each is processed for every row like other commands, but it then iterates through each field in a row based on the filter. In this case, all fields (*). The new count field i created just counts the number of fields. Then the new zero_count field counts the number of fields that equal 0 - the <> notation references the name of the field we are are on while iterating through them all.

After that, i'm just sum'ing those up and doing the percentage math. But you could instead just do the math at each row, or sum by the fields you're interested in, etc.

View solution in original post

0 Karma

woodcock
Esteemed Legend

Like this:

| inputlookup input.csv
| stats [|inputlookup input.csv
| head 1
| eval _search=" "
| foreach * [ eval _search = _search . " count(<<FIELD>>) AS <<FIELD>>_count count(eval(<<FIELD>>==0)) AS <<FIELD>>_zeroes" ]
| rename _search AS search
| return $search ]
| foreach *_count [ eval <<MATCH>>_pct = round((100 * <<MATCH>>_zeroes / <<MATCH>>_count), 2) ]
0 Karma

jacubero
Explorer

There seems to be some mistake. I get the following message when I execute it:

Error in 'eval' command: The expression is malformed. An unexpected character is reached at '<<MATCH>>_zeroes / <<MATCH>>_count), 2)'.
0 Karma

maciep
Champion

do you want results for each row, each column or all together? Here is an example using foreach that should get you the total percent across the entire lookup.

| inputlookup input.csv
| foreach * 
    [ eval count = coalesce(count+1,1), zero_count = if(<<FIELD>> = 0,coalesce(zero_count+1,1),zero_count)]
    | stats sum(count) as total, sum(zero_count) as zero_total
| eval perc_zero = round(100*(zero_total/total),2)

For each is processed for every row like other commands, but it then iterates through each field in a row based on the filter. In this case, all fields (*). The new count field i created just counts the number of fields. Then the new zero_count field counts the number of fields that equal 0 - the <> notation references the name of the field we are are on while iterating through them all.

After that, i'm just sum'ing those up and doing the percentage math. But you could instead just do the math at each row, or sum by the fields you're interested in, etc.

0 Karma

jacubero
Explorer

As you said, with this command you can obtain the total percentage across the entire lookup. I am interested in obtaining results for each column. What will be the command for this calculation?

0 Karma

maciep
Champion

I'm sure woodcock's answer works too, but here is what I came up with

| inputlookup input.csv
| foreach * [ eval <<FIELD>> = if(<<FIELD>>=0,1,0)]
| untable blah column count
| stats count as total sum(count) as zero by column
| addcoltotals labelfield="column" label="Total"
| eval perc_zero = round(100*zero/total,2)

this time we set the field to 1 if it's 0 and 0 if it's non-zero (maybe a little confusing). Then we use untable to get the field names as values into one field called "column", and then we just count to get the total and sum to get the number that are zero and group by the column.

I added the totals at the end too, because why not. I didn't verify that this was accurate, but i think it should work.

jacubero
Explorer

It works great. Thank you.

Get Updates on the Splunk Community!

New in Observability - Improvements to Custom Metrics SLOs, Log Observer Connect & ...

The latest enhancements to the Splunk observability portfolio deliver improved SLO management accuracy, better ...

Improve Data Pipelines Using Splunk Data Management

  Register Now   This Tech Talk will explore the pipeline management offerings Edge Processor and Ingest ...

3-2-1 Go! How Fast Can You Debug Microservices with Observability Cloud?

Register Join this Tech Talk to learn how unique features like Service Centric Views, Tag Spotlight, and ...