Reporting

How to create a column labeling rows based on conditions?

johnsmith454
Engager

I have tried using answers to similar questions on here, but I'm having a problem where I want to create a column of 4 labels. However, when I try to create these, the labels I make eat into re-labeling the first label I have assigned. For example, I am looking to create a label a column like this:

 

Gene   Feature1   Feature2   Feature3 ...  label
Gene1   1            3         1            most likely
Gene2   0            0         1            probable
Gene3   NA           NA        NA           unknown
Gene4   0            0         0            unlikely

 

However, my data is imported from big data analysis and so my features are not represented here, but the 4 labels are what I'm trying to get. I try to code this with:

 

df$label[(df$Mechanism == 1)|(df$med >= 3) |(df$OMIM == 1)] <- "most likely"

df$label[is.na(df$label) & (df$med <= 2 )|(df$SideeffectFreq>=1) |(df$MGI_Gene==1) |(df$model_Gene==1) |(df$Rank>=1) ] <- "probable"

df$label[(df$Causality == 'least likely')] <- "least likely"

df$label[is.na(df$label)] <- "unknown"

 

When I run the first line to create the "most likely" label, this labels 50 genes (which is what I expected), but running the second line for "probable" re-labels some of the "most likely" genes to only give 34 of them left. I thought using is.na(df$label) or (df$label != 'most likely') would resolve this, but neither do.

Is there a better way to go about creating a labels column like this? I am new to coding so also if anyone can explain why the is.na(df$label) or (df$label != 'most likely') do not work as I expected that would also be really helpful.

Edit: Example where 'most likely' label is taken up:

 

#Input data:
dput(dt)
structure(list(Gene = c("gene1", "gene2", "gene3", "gene4"), 
    F1 = c(1L, 0L, 0L, 1L), F2 = c(3L, 0L, 0L, 1L), F3 = c("1", 
    "1", "1", "least likely"), label = c(NA, NA, NA, 
    NA)), row.names = c(NA, -4L), class = c("data.table", 
"data.frame"))

dt$label[(dt$F1 == 1)|(dt$F2 >= 3) |(dt$F1 == 1)] <- "most likely"
dt$label[(dt$label != 'most likely') & (dt$F1 == 2)|(dt$F2 == 0) |(dt$F1 == 1)] <- "probable"
dt$label[(dt$F1 == 0)|(dt$F2 == 0)] <- "unlikely"
dt$label[(dt$F3 == 'least likely')] <- "unknown"

 

Labels (1)

thambisetty
SplunkTrust
SplunkTrust

I think you are looking for eval case statement.

syntax:
| eval newfield=case(condtion1,value to assign to newfield if condition1 matches,condition2, value to assign to newfield if condtion2 matches,...,default,default value)

SPL:

| eval label=case(Feature1=="1" AND Feature2=="3" AND Feature="1","most likely",1=1,"Other")
————————————
If this helps, give a like below.
Get Updates on the Splunk Community!

Welcome to the Splunk Community!

(view in My Videos) We're so glad you're here! The Splunk Community is place to connect, learn, give back, and ...

Tech Talk | Elevating Digital Service Excellence: The Synergy of Splunk RUM & APM

Elevating Digital Service Excellence: The Synergy of Real User Monitoring and Application Performance ...

Adoption of RUM and APM at Splunk

    Unleash the power of Splunk Observability   Watch Now In this can't miss Tech Talk! The Splunk Growth ...