Assuming I'm not completely incorrect, I don't believe there is a way to store a field as a boolean value. There are a few types built into the splunk parser, including string, number, and most relevantly bool, but If I attempt to assign a bool to a field, I get an error.
I'm trying to understand why assigning a bool is forbidden, if there is a programmatic limitation to handling bools or if it's being discouraged because the developers thought that it was some sort of 'code smell' to have Boolean fields. I'd also like to know what the recommended approach is for storing effectively boolean data.
As a use case, let's say that I have a bunch of analysts that need to know whether an IP address is in IPV4 or IPV6 format, but the regex for IPv6 format takes to long to keep rerunning so, I decided to add some pre processing step, report, si, whatever, which adds a field to specify IP format.
I think I have two options here. I could store information as a string using an "ip_format" field with values of "ipv4", "ipv6" or "invalid". Or I could create an "is_ipv4" field with either "true" or "false" string,
Or I could go the Number route and have an "is_ipv4" field with either a 0 and 1 to represent false and true.
All of this works fine, but seems cumbersome. If I go with the string approach, I risk all the dangers of "magic strings", mostly mistyping them or forgetting what they are. As a lesser issue, my later searches are a little longer to write due to the need to write a full matches syntax, and I may suffer some minor expense in doing the string comparison.
If I go with the number approach, I risk less human readable code, as someone needs to know that the 1 and 0 represent true and false, I risk bad code if someone swaps the order etc.
I'm wondering if Splunk developers recommend one of these approaches, or if they support a cleaner option I'm missing. For instance, does some approach for avoiding magic strings, even if it's just some syntactical sugar like having a predefined "true" and "false" string that functions recognize? If there is no syntactical sugar...is there a reason that there isn't syntactical sugar, such as a desire to discourage Boolean fields at all, or is that just a feature that is a nice to have, but wasn't prioritized yet?
In this specific case, you really have an enumeration of values rather than a boolean - there's IPv4, IPv6, and invalid. A plain boolean
is_ipv4 won't be able to distinguish between IPv6 and invalid, so I'd recommend category names to enumerate your options.
Hi dsollen, I'm not exactly a Splunk Developer, but I've been working with Splunk for a while, and I don't believe there is any very clean solution to defining boolean type fields. What I usually see is either relying on a literal true/false string, and then relying on a lack of typos when later evaluating.
Something else I've seen, which might end up working out better for whatever you are trying to do, is to determine true/false based on a string being null or not. This will sidestep typos as the presence of any characters at all would equate to true.
Please let me know if this helps, or if you have any other questions 😄
I'm neither a developer, but to add to this answer, I believe it is important to consider how splunk searches before you decide to use 0 and 1.
When you do a search for
is_ipv4=1, splunk will go and look everywhere for a
1. In those results, it then checks if this
1 is in a field called
is_ipv4, and discard those where isn't. Depending on your data, this might cause quite some inefficiency, which is why I would also recommend to go with a null/value field over one with 0/1.
This, times a million. Searching for small numbers is inherently slow with default tokenization and search-time field extraction. If you control the data, there are quick fixes though - for example, to match
is_ipv4=1 you could search for
TERM(is_ipv4=1) for amazing speed... you just have to be certain it'll always be written like that, never be wrapped in quotes, etc.
Literal trues and falses are good because everyone immediately understands what's going on, and - unless your events contain the strings elsewhere frequently - they'll be okay in terms of speed.
From a performance point of view, going with "null or value" / "field is present or not" is bad mojo. You'll end up with searches like this:
That'll scan ALL the events 😞
Positive searches - assuming you don't want people to rely on the spelling of the magic non-null value - also are terrible:
This also scans ALL the events 😞