I have some data with flip-flop values akin to the following simulation
| makeresults count=20
| eval id = "id" . (random() % 2 + 1), bad = case(random() % 3 == 0, "real " . id, random() % 5 == 0, "", true(), "fake")
What it does is to insert a known "fake" value, and zero-length string in some events. (Comically, the fake value is the string literal "null".) A sample data set could look like
| _time | bad | id |
| 2021-06-09 18:45:09 | fake | id2 |
| 2021-06-09 18:45:09 | fake | id2 |
| 2021-06-09 18:45:09 | fake | id1 |
| 2021-06-09 18:45:09 | fake | id2 |
| 2021-06-09 18:45:09 | fake | id1 |
| 2021-06-09 18:45:09 | real id2 | id2 |
| 2021-06-09 18:45:09 | real id1 | id1 |
| 2021-06-09 18:45:09 | fake | id2 |
| 2021-06-09 18:45:09 | real id1 | id1 |
| 2021-06-09 18:45:09 | fake | id2 |
| 2021-06-09 18:45:09 | real id2 | id2 |
| 2021-06-09 18:45:09 | id2 | |
| 2021-06-09 18:45:09 | fake | id1 |
| 2021-06-09 18:45:09 | fake | id1 |
| 2021-06-09 18:45:09 | id2 | |
| 2021-06-09 18:45:09 | fake | id2 |
| 2021-06-09 18:45:09 | fake | id1 |
| 2021-06-09 18:45:09 | real id1 | id1 |
| 2021-06-09 18:45:09 | real id2 | id2 |
| 2021-06-09 18:45:09 | fake | id2 |
For my purposes, all I care is that any field whose value is not "fake" or "" be extracted. The challenge is, of course, that some records may have bad value for "legitimate" reasons (e.g., data entry or machine errors) so all their values are "fake". I also cannot predict whether all fields are having "fake" in the same record, i.e., "real" value for different fields may be contained in different events.
One method I come up with is to mvfilter() from bad, like so
| dedup id bad
| eventstats values(bad) as bad by id
| eval realbad = mvfilter(NOT match(bad, "fake"))
(dedup is added just to reduce load for eventstats. Not sure if that makes a difference.) Lucky for me, values() seems to have dropped those zero-length values. (For my purposes, it is OK if this field becomes null, in case all values are zero-length or fake.) Otherwise, I'll just add another test in mvfilter().
Whereas this produces desired results, I feel that eventstats+mvfilter() could still be expensive. Any standard/recommended methods?
(Clarification: Whereas sample simulation illustrates one 'bad' field, there are multiple bad fields that flip-flop seemingly independent of each other, so globally filtering out all events containing 'invalid' values is not an option because the use case requires valid value of as many fields as possible.)
Lucky for me, values() seems to have dropped those zero-length values. (For my purposes, it is OK if this field becomes null, in case all values are zero-length or fake.)
Two problems:
Both empty string and "fake" are more difficult for downstream processing; a mixture of two values to represent bad states is worse for my purposes.
Additionally, eventstats is useless for this illustration and not used in my use case. I also realize that match() is totally unnecessary when string identity will do better. It is almost as if brute force is better.
| dedup bad id
| eval bad = if(bad IN ("", "fake"), null(), bad)
| stats values(*) as * by id
As mentioned in another comment, I have multiple "bad" fields.
With this workaround, each of them still needs to eval separately, but at least values() can use wildcard.
Lucky for me, values() seems to have dropped those zero-length values. (For my purposes, it is OK if this field becomes null, in case all values are zero-length or fake.)
Two problems:
Both empty string and "fake" are more difficult for downstream processing; a mixture of two values to represent bad states is worse for my purposes.
Additionally, eventstats is useless for this illustration and not used in my use case. I also realize that match() is totally unnecessary when string identity will do better. It is almost as if brute force is better.
| dedup bad id
| eval bad = if(bad IN ("", "fake"), null(), bad)
| stats values(*) as * by id
As mentioned in another comment, I have multiple "bad" fields.
With this workaround, each of them still needs to eval separately, but at least values() can use wildcard.
I am not sure I understand what you are trying to do - can you not just
| where bad!="fake" AND bad!=""
The reason why that will not work is because we don't know which events contain valid values for which field. (I probably oversimplified in sample simulation. There are quite a number of bad fields kind of flip-flop in unpredictable ways.)