We are writing Log Statements in Java, and then reviewing the info and exception alerts.
Our team is then conducting a Splunk Search count of log statements by Category.
Many of our log statements can have share multiple categories. Using this reference for key-value pair, https://dev.splunk.com/enterprise/docs/developapps/addsupport/logging/loggingbestpractices/
So in our log statements,
We are doing
LOG.info("CategoryA=true , CategoryG=true");
Of course, we aren't going to write "Category=false" in any logger, since its inherent in the statement.
Is this a overall good method to count values in Splunk by Category, or do you recommend a better practice?
I would take issue with some of the statements as "best practice" for logging standards. We often find developer friendly formats, such as JSON cause large ingestion volumes compared to the value of the data contained in the JSON. The ratio of field names to usable field values can typically be 50% and often developer logging frameworks will just dump out JSON objects with empty field values, which is a real cost.
I often see clients hitting their ingestion licence limits then having to push back to developers who have written dashboards on their data, asking them to shrink their data.
Anyway, as to your question, if you want to count how many of CategoryA are true and how many false, if false is not written, you can only extrapolate the false count to be the total count - true count, on the assumption that all events are implicitly false. Therefore you need to know the data to be able to make those searches.
It's fine to have things like cat_a=true or categorya=1 - however, if you have 100 million events per day, then use =1, not =true, so you save 300MB/day ingestion cost 😄 also mapping a "true" to something you can count on is more expensive instead of doing this simple wildcarding logic of
| stats sum(cat_*) as cat_*
if you have predictable naming conventions.
Please also do not write full Java class names in the logs, e.g org.apache.catalina.bla.bla.bla as this has no value and just costs in licence ingest. Most logging frameworks have the ability to abbreviate package names to a single character and there is rarely ambiguity in class names.