Reporting

How to create a column labeling rows based on conditions?

johnsmith454
Engager

I have tried using answers to similar questions on here, but I'm having a problem where I want to create a column of 4 labels. However, when I try to create these, the labels I make eat into re-labeling the first label I have assigned. For example, I am looking to create a label a column like this:

 

Gene   Feature1   Feature2   Feature3 ...  label
Gene1   1            3         1            most likely
Gene2   0            0         1            probable
Gene3   NA           NA        NA           unknown
Gene4   0            0         0            unlikely

 

However, my data is imported from big data analysis and so my features are not represented here, but the 4 labels are what I'm trying to get. I try to code this with:

 

df$label[(df$Mechanism == 1)|(df$med >= 3) |(df$OMIM == 1)] <- "most likely"

df$label[is.na(df$label) & (df$med <= 2 )|(df$SideeffectFreq>=1) |(df$MGI_Gene==1) |(df$model_Gene==1) |(df$Rank>=1) ] <- "probable"

df$label[(df$Causality == 'least likely')] <- "least likely"

df$label[is.na(df$label)] <- "unknown"

 

When I run the first line to create the "most likely" label, this labels 50 genes (which is what I expected), but running the second line for "probable" re-labels some of the "most likely" genes to only give 34 of them left. I thought using is.na(df$label) or (df$label != 'most likely') would resolve this, but neither do.

Is there a better way to go about creating a labels column like this? I am new to coding so also if anyone can explain why the is.na(df$label) or (df$label != 'most likely') do not work as I expected that would also be really helpful.

Edit: Example where 'most likely' label is taken up:

 

#Input data:
dput(dt)
structure(list(Gene = c("gene1", "gene2", "gene3", "gene4"), 
    F1 = c(1L, 0L, 0L, 1L), F2 = c(3L, 0L, 0L, 1L), F3 = c("1", 
    "1", "1", "least likely"), label = c(NA, NA, NA, 
    NA)), row.names = c(NA, -4L), class = c("data.table", 
"data.frame"))

dt$label[(dt$F1 == 1)|(dt$F2 >= 3) |(dt$F1 == 1)] <- "most likely"
dt$label[(dt$label != 'most likely') & (dt$F1 == 2)|(dt$F2 == 0) |(dt$F1 == 1)] <- "probable"
dt$label[(dt$F1 == 0)|(dt$F2 == 0)] <- "unlikely"
dt$label[(dt$F3 == 'least likely')] <- "unknown"

 

Labels (1)

thambisetty
Super Champion

I think you are looking for eval case statement.

syntax:
| eval newfield=case(condtion1,value to assign to newfield if condition1 matches,condition2, value to assign to newfield if condtion2 matches,...,default,default value)

SPL:

| eval label=case(Feature1=="1" AND Feature2=="3" AND Feature="1","most likely",1=1,"Other")
————————————
If this helps, give a like below.
Get Updates on the Splunk Community!

This Week's Community Digest - Splunk Community Happenings [9.26.22]

Get the latest news and updates from the Splunk Community here! Upcoming User Group Events! &#x1f44f; Check ...

BSides Splunk 2022 - The Call for Papers is now Open!

TLDR; Main Site: https://bsidessplunk.com CFP Site: https://bsidessplunk.com/cfp CFP Opens: December 15th, ...

Sending Metrics to Splunk Enterprise With the OpenTelemetry Collector

This blog post is part of an ongoing series on OpenTelemetry. The OpenTelemetry project is the second largest ...