Managing multiple data sources with inconsistent field names—like SourceAddress in firewalls versus IpAddress in Windows—creates massive SPL complexity and slows down incident response. The Common Information Model (CIM) solves this by normalizing your data, allowing you to build one dashboard that works across every data source simultaneously.
Key Takeaways
Search Simplification: Use one field name (e.g., src_ip) to query multiple disparate indexes at once.
Automatic Scalability: Newly onboarded CIM-compliant data is automatically picked up by existing alerts and dashboards.
Reduced Maintenance: Leveraging CIM-compliant TAs from Splunkbase ensures your schema stays up to date even when vendors change their logging formats.
The Common Information Model (CIM) in Splunk is a common way of naming fields from all your different data sources so that you can build content quickly and easily across all that different data at the same time.
If your firewall sends data with SourceAddress and your Windows hosts send IpAddress your Splunk SPL query query would be:
(index=firewall SourceAddress=127.0.0.1) OR (index=windows IpAddress=127.0.0.1)
and you would have to already know that the firewall is SourceAddress and Windows is IpAddress.
With CIM, you don’t have to know anything ahead of time and your queries are more simple:
(index=firewall OR index=windows) src_ip=127.0.0.1
But wait, how do you know IpAddress is the source and not the destination? You don’t have to know actually
(index=firewall OR index=windows) (src_ip=127.0.0.1 OR dest_ip=127.0.0.1)
Without CIM normalization, a search for suspicious login attempts would need to accommodate Active Directory's field names, Azure's naming conventions, and Okta's structure—all in a single query… and then the vendor changes their schema…
CIM provides more than 20 domain-specific data models (Alerts, Email, Network Traffic, etc.). While there are ~500 fields available, most practitioners only need the "Vital 20."
You need the common CIM fields from across all the data models. Line them up in order of most to least important and in order of directionality.
Focus on these core fields for your dashboards and alerts:
_time, severity, action, signature, src_user, dest_user, src_ip, src_nt_host,
src_nt_domain, src_*, src, dest, dest_ip, dest_nt_host, dest_nt_domain, dest_*,
description
Note: Anything else is maybe an edge case or a deep dive that requires access to raw data, not what you need to surface quickly in a dashboard, report, or alert.
Some fields require prescribed values to maintain data integrity. The best example is severity, which should strictly follow these values:
critical, high, medium, low, informational, unknown
Most fields do not have specific values defined in the documentation. There are a few fields you may want to internally define with acceptable values for your organization.
You may want to require the field src_user to always be your long format user ID such as first.last@mydomain.com. Alternatively, you may want to define that src_user is instead the short form flast user ID.
The fields src_nt_host and dest_nt_host are another example where you may want to decide these should always be short NETBIOS hostnames and not fully.qualified.hostnames.mydomain.com.
In addition to the kind of value acceptable for a field, you may also want to define the case-sensitivity of the value, such as lower() for users and upper() for hosts.
There is a bit of art to this. For example...do you choose to align with category in addition to the above? How is the field type in the Alerts data model different than the field category in the DLP data model?
What about verdict or outcome? There isn’t really anything like this in the CIM but it’s important for security alerts to be able to classify detections as either “malicious” or “benign”.
You will have to make some decisions for your organization, as long as you stay consistent in your use of a few custom fields, that’s ok.
For example, to be able to trace a single event from the source system to Splunk and then to a ticketing system, you may want to use a combination of the CIM id field and your own custom internal_id field. Injecting this custom field at index time creates a link backward to the raw source data and forward to the ticketing system when added to the payload.
However, do not over engineer. More than 20ish fields begs the question...is this really necessary in most scenarios?
You can extend CIM with custom fields, but don't modify exiting CIM mappings, especially if part of a supported CIM compliant TA or App. This breaks compatibility with the CIM ecosystem. If you insist...ALIAS is your friend.
There are two kinds of Apps and TAs to help with CIM, meta-Apps and TAs that that assist in working with CIM and data models and Apps and TAs that assist in on-boarding data to Splunk and applying CIM mappings to source system field naming schemas.
Wherever possible, when on-boarding data, try to use a (supported) CIM compliant app from SplunkBase.
Use the SplunkBase search filters to find supported CIM compliant TAs that provide data on-boarding and CIM mappings:
To work directly with the CIM schema or data models in Splunk, if you are mapping a custom data set or fixing a broken CIM mapping in an unsupported TA or App for example, install the “CIM Validator” and the “Common Information Model” Apps.
The dictionary lookup for fields is super helpful:
Deploys data models using CIM for configuration and acceleration
Building CIM means knowing your data. You will need the admin guide for your data, or if it’s a custom data set, engage with the creator of the data. You will also want to have the CIM data model references handy, or the CIM validator app deployed in your Splunk environment.
Once you have your data reference material (if available) and the CIM data model guides, there are just a few steps to building your own CIM:
If your query says | where isnotnull(src_ip)...search also for |where isnull(src_ip) and make sure none of the results should have had a src_ip field mapped.
Splunk .conf online and Splunk Lantern has a few resources to walk you through creating CIM step-by-step using a real-world ransomware example:
Not using CIM means more complex searches requiring greater SPL skill from your users leading to frustration, underutilization, and attrition.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.