Reporting

How to create a column labeling rows based on conditions?

johnsmith454
Engager

I have tried using answers to similar questions on here, but I'm having a problem where I want to create a column of 4 labels. However, when I try to create these, the labels I make eat into re-labeling the first label I have assigned. For example, I am looking to create a label a column like this:

 

Gene   Feature1   Feature2   Feature3 ...  label
Gene1   1            3         1            most likely
Gene2   0            0         1            probable
Gene3   NA           NA        NA           unknown
Gene4   0            0         0            unlikely

 

However, my data is imported from big data analysis and so my features are not represented here, but the 4 labels are what I'm trying to get. I try to code this with:

 

df$label[(df$Mechanism == 1)|(df$med >= 3) |(df$OMIM == 1)] <- "most likely"

df$label[is.na(df$label) & (df$med <= 2 )|(df$SideeffectFreq>=1) |(df$MGI_Gene==1) |(df$model_Gene==1) |(df$Rank>=1) ] <- "probable"

df$label[(df$Causality == 'least likely')] <- "least likely"

df$label[is.na(df$label)] <- "unknown"

 

When I run the first line to create the "most likely" label, this labels 50 genes (which is what I expected), but running the second line for "probable" re-labels some of the "most likely" genes to only give 34 of them left. I thought using is.na(df$label) or (df$label != 'most likely') would resolve this, but neither do.

Is there a better way to go about creating a labels column like this? I am new to coding so also if anyone can explain why the is.na(df$label) or (df$label != 'most likely') do not work as I expected that would also be really helpful.

Edit: Example where 'most likely' label is taken up:

 

#Input data:
dput(dt)
structure(list(Gene = c("gene1", "gene2", "gene3", "gene4"), 
    F1 = c(1L, 0L, 0L, 1L), F2 = c(3L, 0L, 0L, 1L), F3 = c("1", 
    "1", "1", "least likely"), label = c(NA, NA, NA, 
    NA)), row.names = c(NA, -4L), class = c("data.table", 
"data.frame"))

dt$label[(dt$F1 == 1)|(dt$F2 >= 3) |(dt$F1 == 1)] <- "most likely"
dt$label[(dt$label != 'most likely') & (dt$F1 == 2)|(dt$F2 == 0) |(dt$F1 == 1)] <- "probable"
dt$label[(dt$F1 == 0)|(dt$F2 == 0)] <- "unlikely"
dt$label[(dt$F3 == 'least likely')] <- "unknown"

 

Labels (1)

thambisetty
SplunkTrust
SplunkTrust

I think you are looking for eval case statement.

syntax:
| eval newfield=case(condtion1,value to assign to newfield if condition1 matches,condition2, value to assign to newfield if condtion2 matches,...,default,default value)

SPL:

| eval label=case(Feature1=="1" AND Feature2=="3" AND Feature="1","most likely",1=1,"Other")
————————————
If this helps, give a like below.
Career Survey
First 500 qualified respondents will receive a $20 gift card! Tell us about your professional Splunk journey.

Can’t make it to .conf25? Join us online!

Get Updates on the Splunk Community!

What Is Splunk? Here’s What You Can Do with Splunk

Hey Splunk Community, we know you know Splunk. You likely leverage its unparalleled ability to ingest, index, ...

Level Up Your .conf25: Splunk Arcade Comes to Boston

With .conf25 right around the corner in Boston, there’s a lot to look forward to — inspiring keynotes, ...

Manual Instrumentation with Splunk Observability Cloud: How to Instrument Frontend ...

Although it might seem daunting, as we’ve seen in this series, manual instrumentation can be straightforward ...