Deployment Architecture

Adjust ES Urgency based on Risk

David
Splunk Employee
Splunk Employee

I have ES, and I love the Risk Framework for understanding holistic risk for my users and systems. And I can sort the notable events by risk, which is also really useful! But I wish that I could set the urgency for my notable events to Critical if the users or assets have a really high amount of risk. Particularly when there are many different users or assets involved!

0 Karma
1 Solution

David
Splunk Employee
Splunk Employee

Absolutely David. Because risk scores and notable events are just data, we can easily do this. Here's a sample search that will escalate to critical any open events that have an aggregate risk (between user, src, and dest) in excess of 1000. You may want to play with the "where" statement to apply your own business logic for when you want this to trigger (for example, maybe you only want it to run against new events, or not on "Pending" events).

`notable` 
| fields _time dest src user status_label owner rule_id rule_name urgency 
| fields - _raw _time 
| eval risk_objects=mvdedup(mvappend(dest, src, user)) 
| mvexpand risk_objects 
| lookup risk_correlation_by_system_lookup risk_object as risk_objects output risk_score as system_risk 
| lookup risk_correlation_by_user_lookup risk_object as risk_objects output risk_score as user_risk 
| eval risk_score=coalesce(user_risk,0) + coalesce(system_risk, 0) 
| stats sum(risk_score) as aggregate_risk_score dc(risk_objects) as number_of_risk_objects by status_label owner rule_id rule_name urgency 
| where (aggregate_risk_score > 1000 AND urgency!="critical" AND status_label!="Closed" OR status_label="whatever logic you want here..." ) 
| table rule_id status owner rule_name 
| eval comment="Auto-setting Urgency to 'Critical' due to high level of aggregate risk.", urgency="critical", time=now(), user="admin" 
| outputlookup append=t incident_review_lookup

Walking through what this search does:

  1. First we use the notable macro that will enrich the event with all the correct statuses, give us multi-value fields, etc.
  2. Then we create risk_objects as the amalgamation of dest, src, and user. We dedup that (don't need to double-count if the src and dest are the same object), and expand each into its own event so that we can safely do a lookup.
  3. Then we use the risk_correlation_by_system_lookup and risk_correlation_by_user_lookup to pull the current risk score for the systems/users (this was added in ES 4.5 ish).
  4. Next we sum back up the aggregate risk (and number of unique entities) per correlation rule
  5. Here's the critical step: Now we include / exclude what we do or don't want to auto-update. This is your business logic.
  6. Finally we output this into the incident_review_lookup which stores all of our statuses / updates.

Schedule that search to run every so often and you'll be set. Notably, you probably want this search to run over a day or week each time it runs -- if a notable was created a week ago, but now you have new data in the risk framework that should prioritize that notable, you'll want that surfaced!

Here's what it looks like from some ES Demo data:
alt text

View solution in original post

David
Splunk Employee
Splunk Employee

Absolutely David. Because risk scores and notable events are just data, we can easily do this. Here's a sample search that will escalate to critical any open events that have an aggregate risk (between user, src, and dest) in excess of 1000. You may want to play with the "where" statement to apply your own business logic for when you want this to trigger (for example, maybe you only want it to run against new events, or not on "Pending" events).

`notable` 
| fields _time dest src user status_label owner rule_id rule_name urgency 
| fields - _raw _time 
| eval risk_objects=mvdedup(mvappend(dest, src, user)) 
| mvexpand risk_objects 
| lookup risk_correlation_by_system_lookup risk_object as risk_objects output risk_score as system_risk 
| lookup risk_correlation_by_user_lookup risk_object as risk_objects output risk_score as user_risk 
| eval risk_score=coalesce(user_risk,0) + coalesce(system_risk, 0) 
| stats sum(risk_score) as aggregate_risk_score dc(risk_objects) as number_of_risk_objects by status_label owner rule_id rule_name urgency 
| where (aggregate_risk_score > 1000 AND urgency!="critical" AND status_label!="Closed" OR status_label="whatever logic you want here..." ) 
| table rule_id status owner rule_name 
| eval comment="Auto-setting Urgency to 'Critical' due to high level of aggregate risk.", urgency="critical", time=now(), user="admin" 
| outputlookup append=t incident_review_lookup

Walking through what this search does:

  1. First we use the notable macro that will enrich the event with all the correct statuses, give us multi-value fields, etc.
  2. Then we create risk_objects as the amalgamation of dest, src, and user. We dedup that (don't need to double-count if the src and dest are the same object), and expand each into its own event so that we can safely do a lookup.
  3. Then we use the risk_correlation_by_system_lookup and risk_correlation_by_user_lookup to pull the current risk score for the systems/users (this was added in ES 4.5 ish).
  4. Next we sum back up the aggregate risk (and number of unique entities) per correlation rule
  5. Here's the critical step: Now we include / exclude what we do or don't want to auto-update. This is your business logic.
  6. Finally we output this into the incident_review_lookup which stores all of our statuses / updates.

Schedule that search to run every so often and you'll be set. Notably, you probably want this search to run over a day or week each time it runs -- if a notable was created a week ago, but now you have new data in the risk framework that should prioritize that notable, you'll want that surfaced!

Here's what it looks like from some ES Demo data:
alt text

jbrodsky_splunk
Splunk Employee
Splunk Employee

Thank you David. I had a customer that asked this very same question earlier this morning! Also, lovely use of "amalgamation" above.

Get Updates on the Splunk Community!

Introduction to Splunk Observability Cloud - Building a Resilient Hybrid Cloud

Introduction to Splunk Observability Cloud - Building a Resilient Hybrid Cloud  In today’s fast-paced digital ...

Observability protocols to know about

Observability protocols define the specifications or formats for collecting, encoding, transporting, and ...

Take Your Breath Away with Splunk Risk-Based Alerting (RBA)

WATCH NOW!The Splunk Guide to Risk-Based Alerting is here to empower your SOC like never before. Join Haylee ...