I have tried using answers to similar questions on here, but I'm having a problem where I want to create a column of 4 labels. However, when I try to create these, the labels I make eat into re-labeling the first label I have assigned. For example, I am looking to create alabel a column like this:
Gene Feature1 Feature2 Feature3 ... label
Gene1 1 3 1 most likely
Gene2 0 0 1 probable
Gene3 NA NA NA unknown
Gene4 0 0 0 unlikely
However, my data is imported from big data analysis and so my features are not represented here, but the 4 labels are what I'm trying to get. I try to code this with:
When I run the first line to create the "most likely" label, this labels 50 genes (which is what I expected), but running the second line for "probable" re-labels some of the "most likely" genes to only give 34 of them left. I thought usingis.na(df$label)or(df$label != 'most likely')would resolve this, but neither do.
Is there a better way to go about creating a labels column like this? I am new to coding so also if anyone can explain why theis.na(df$label)or(df$label != 'most likely')do not work as I expected that would also be really helpful.
Edit: Example where 'most likely' label is taken up:
| eval newfield=case(condtion1,value to assign to newfield if condition1 matches,condition2, value to assign to newfield if condtion2 matches,...,default,default value)
| eval label=case(Feature1=="1" AND Feature2=="3" AND Feature="1","most likely",1=1,"Other")