Solved: Adjust ES Urgency based on Risk

David · ‎04-05-2017

I have ES, and I love the Risk Framework for understanding holistic risk for my users and systems. And I can sort the notable events by risk, which is also really useful! But I wish that I could set the urgency for my notable events to Critical if the users or assets have a really high amount of risk. Particularly when there are many different users or assets involved!

David · ‎04-05-2017

Absolutely David. Because risk scores and notable events are just data, we can easily do this. Here's a sample search that will escalate to critical any open events that have an aggregate risk (between user, src, and dest) in excess of 1000. You may want to play with the "where" statement to apply your own business logic for when you want this to trigger (for example, maybe you only want it to run against new events, or not on "Pending" events).

`notable` 
| fields _time dest src user status_label owner rule_id rule_name urgency 
| fields - _raw _time 
| eval risk_objects=mvdedup(mvappend(dest, src, user)) 
| mvexpand risk_objects 
| lookup risk_correlation_by_system_lookup risk_object as risk_objects output risk_score as system_risk 
| lookup risk_correlation_by_user_lookup risk_object as risk_objects output risk_score as user_risk 
| eval risk_score=coalesce(user_risk,0) + coalesce(system_risk, 0) 
| stats sum(risk_score) as aggregate_risk_score dc(risk_objects) as number_of_risk_objects by status_label owner rule_id rule_name urgency 
| where (aggregate_risk_score > 1000 AND urgency!="critical" AND status_label!="Closed" OR status_label="whatever logic you want here..." ) 
| table rule_id status owner rule_name 
| eval comment="Auto-setting Urgency to 'Critical' due to high level of aggregate risk.", urgency="critical", time=now(), user="admin" 
| outputlookup append=t incident_review_lookup

Walking through what this search does:

First we use the notable macro that will enrich the event with all the correct statuses, give us multi-value fields, etc.
Then we create risk_objects as the amalgamation of dest, src, and user. We dedup that (don't need to double-count if the src and dest are the same object), and expand each into its own event so that we can safely do a lookup.
Then we use the risk_correlation_by_system_lookup and risk_correlation_by_user_lookup to pull the current risk score for the systems/users (this was added in ES 4.5 ish).
Next we sum back up the aggregate risk (and number of unique entities) per correlation rule
Here's the critical step: Now we include / exclude what we do or don't want to auto-update. This is your business logic.
Finally we output this into the incident_review_lookup which stores all of our statuses / updates.

Schedule that search to run every so often and you'll be set. Notably, you probably want this search to run over a day or week each time it runs -- if a notable was created a week ago, but now you have new data in the risk framework that should prioritize that notable, you'll want that surfaced!

Here's what it looks like from some ES Demo data:

View solution in original post

David · ‎04-05-2017

Absolutely David. Because risk scores and notable events are just data, we can easily do this. Here's a sample search that will escalate to critical any open events that have an aggregate risk (between user, src, and dest) in excess of 1000. You may want to play with the "where" statement to apply your own business logic for when you want this to trigger (for example, maybe you only want it to run against new events, or not on "Pending" events).

`notable` 
| fields _time dest src user status_label owner rule_id rule_name urgency 
| fields - _raw _time 
| eval risk_objects=mvdedup(mvappend(dest, src, user)) 
| mvexpand risk_objects 
| lookup risk_correlation_by_system_lookup risk_object as risk_objects output risk_score as system_risk 
| lookup risk_correlation_by_user_lookup risk_object as risk_objects output risk_score as user_risk 
| eval risk_score=coalesce(user_risk,0) + coalesce(system_risk, 0) 
| stats sum(risk_score) as aggregate_risk_score dc(risk_objects) as number_of_risk_objects by status_label owner rule_id rule_name urgency 
| where (aggregate_risk_score > 1000 AND urgency!="critical" AND status_label!="Closed" OR status_label="whatever logic you want here..." ) 
| table rule_id status owner rule_name 
| eval comment="Auto-setting Urgency to 'Critical' due to high level of aggregate risk.", urgency="critical", time=now(), user="admin" 
| outputlookup append=t incident_review_lookup

Walking through what this search does:

First we use the notable macro that will enrich the event with all the correct statuses, give us multi-value fields, etc.
Then we create risk_objects as the amalgamation of dest, src, and user. We dedup that (don't need to double-count if the src and dest are the same object), and expand each into its own event so that we can safely do a lookup.
Then we use the risk_correlation_by_system_lookup and risk_correlation_by_user_lookup to pull the current risk score for the systems/users (this was added in ES 4.5 ish).
Next we sum back up the aggregate risk (and number of unique entities) per correlation rule
Here's the critical step: Now we include / exclude what we do or don't want to auto-update. This is your business logic.
Finally we output this into the incident_review_lookup which stores all of our statuses / updates.

Schedule that search to run every so often and you'll be set. Notably, you probably want this search to run over a day or week each time it runs -- if a notable was created a week ago, but now you have new data in the risk framework that should prioritize that notable, you'll want that surfaced!

Here's what it looks like from some ES Demo data:

jbrodsky_splunk · ‎04-05-2017

Thank you David. I had a customer that asked this very same question earlier this morning! Also, lovely use of "amalgamation" above.

Adjust ES Urgency based on Risk

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Best Practices: Splunk auto adjust pipeline queue

Laser Bananas and Edge Hubs: Exploring Operational Technology (OT) Data Through a ...

Event Series: Mastering AI Tokenomics and Splunk Agent Observability

Join the Conversation