Expanding a bit on my question from last year, "categorize or classify dissimilar field values at search time?": How does one simplify, streamline, maintain event classification on verbose and non-u...
See more...
Expanding a bit on my question from last year, "categorize or classify dissimilar field values at search time?": How does one simplify, streamline, maintain event classification on verbose and non-uniform logs? E.g. if someone decided to dump 50 different application logs into one and where each application produces different events that need classification? Something like "linux_messages_syslog" which aggregates a number of different application and service logs into one? In my case, it's a content management and orchestration platform with a variety of different modules each producing a variety of different event types. I assume the primary mechanism is to use (1) rex field extraction then (2) case-match groups like this: | rex field=event_message "<regex_01>"
| rex field=event_message "<regex_03>"
| rex field=event_message "<regex_05>"
| eval event_category = case (
match( event_message, "<regex_01>"), category_01.": ".extracted_field_01,
match( event_message, "<regex_02>"), category_02,
match( event_message, "<regex_03>"), category_03.": ".extracted_field_03,
match( event_message, "<regex_04>"), category_04,
match( event_message, "<regex_05>"), category_05.": ".extracted_field_05,
true(), "<uncategorized_yet>"
)
| stats count dc(extracted_field_01) dc(extracted_field_03) by event_category, log_level
| sort -count Also assuming that fields can't be extracted via "match" clauses and thus I have to use the same regex statements twice - first in field extraction and then in case-match groups where I classify the events?) Here is a small sample of the actual classification SPL: index=main sourcetype="custom_application"
| rex field=component "\.(?P<component_shortname>\w+)$"
| rex field=event_message "^<(?P<event_msg>.*)>$"
| rex field=event_message "^<(?P<action>Created|Creating) (?P<object>capacity|workflow execution|step execution) sample(\.+| in (?P<sample_creation_time>\d+) ms\.)>$"
| rex field=event_message "^<GENERATED TEST OUTPUT: (?P<filetype>(?P<asset_type>\w+) asset|AdPod file|closed captioning / subtitle assets) (?P<filename>.*) ingested successfully>$"
| rename COMMENT AS "the above is just a sample - there are about 20-30 more rex statements"
| eval event_category = case (
match( event_message, "^<(?P<action>Created|Creating) (?P<object>capacity|workflow execution|step execution) sample(\.+| in (?P<sample_creation_time>\d+) ms\.)>$"), action." ".object." sample",
match( event_message, "^<GENERATED TEST OUTPUT: (?P<filetype>(?P<asset_type>\w+) asset|AdPod file|closed captioning / subtitle assets) (?P<filename>.*) ingested successfully>$"), "GENERATED TEST OUTPUT: ".filetype." <filename> ingested successfully",
match( event_message, "^<GENERATED TEST OUTPUT: .*>"), event_msg,
true(), "<other>"
)
| eval event_category_long = component_shortname." (".log_level."): ".event_category (In reality it's already a few pages long, and I am far from done.) Event samples: 2020-09-16 00:04:29,253 INFO [com.custom_app.plugin.workflow.execution.processor.TestStepExecutionProcessor] (partnerPackageWorkflow-log result [ex 34588028]) - <GENERATED TEST OUTPUT: deliveries complete for partner some_partner>
2020-09-16 00:03:20,462 INFO [com.custom_app.plugin.workflow.execution.processor.TestStepExecutionProcessor] (packageDelivery-log result [ex 34588139]) - <GENERATED TEST OUTPUT: package_name_anonymized delivered successfully>
2020-09-16 00:03:41,183 TRACE [com.custom_app.workflow.services.ReportService] (pool-8-thread-68) - <Created step execution sample in 57 ms.>
2020-09-16 00:03:41,126 TRACE [com.custom_app.workflow.services.ReportService] (pool-8-thread-68) - <Creating step execution sample...>
2020-09-15 23:58:24,896 INFO [com.custom_app.plugin.workflow.execution.processor.ThrottledSubflowStepExecutionProcessor] (partnerPackageWorkflow-deliver package items [ex 34588027]) - <Executing as the **THROTTLED** subflow step processor.> Context: Application produces logs with a large number of of dissimilar events where log_level (INFO, DEBUG, etc.) is often meaningless and where INFO events may need to be reclassified as ERROR type events and alerted on accordingly. Key parts of the above code: extract fields where needed e.g. via a number of "rex" statements categorize using case( match( field, regex_01), event_category_01, ...))