Getting Data In

How to map existing sourcetypes to CIM data models?

Runals
Motivator

I have an environment with a large number of sourcetypes and would like to map those to the appropriate CIM data model. While I generally know about the Splunk commands pivot and datamodel, their use seems to depend on the fields already having the 'correct' names or the sourcetypes already having been tagged. Has anyone found a decent way to attempt to have Splunk detect that sourcetype X might map to data model Y?

1 Solution

Runals
Motivator

This solution involves the use of the Data Curator app. If are downloading the app for the first time to look into this solution make sure you run the ‘build sourcetype_fields csv’ saved search after installation.

A more complete writeup of this methodology can be found here.

I ended up manually creating a list of fields associated with most of the data models and objects and put that into Splunk as a lookup with wildcards. The following query bounces the sourcetype_fields lookup from the Data Curator app to the data model field list.

| inputlookup sourcetype_fields.csv | eval field = lower(field) | lookup dm_fields field as field | search model!=none | stats dc(field) as fields values(field) as field_list by sourcetype model object | where fields > 1 | sort -fields | stats max(fields) as maxFieldMatch  list(model) as Model list(object) as Object list(fields) as fieldMatch by sourcetype | sort -maxFieldMatch

The results of the search in my environment are what you might call directionally correct moreso than exact =). There are quite a number of good, actionable results but much of the depends on what fields names you have defined already. One thing I did pre CIM years ago was settle on src_ip and dest_ip for IP address related fields so we already had that baked into many of our sourcetypes. If you've used source_ip or destination_ip you might want to look into adjusting the lookup. I also saw a fair bit of matches for both Web and Network Traffic for the same sourcetype as there is fair bit of field naming overlap. Not to be out done the Windows Security logs had 5 matches lol: Web, Email (filtering & email objects), Network Traffic, and Authentication.

If you have any feedback to the list below or the methodology I'm all ears!

Transforms

[dm_fields]
filename = data_models.csv
match_type = WILDCARD(field)
max_matches = 1
min_matches = 1

Lookup

field,model,object
"*bytes*",Web,Web
"*bytes_in*",Web,Web
"*bytes_out*",Web,Web
"*cached*",Web,Web
"*cookie*",Web,Web
dest,Web,Web
"*duration*",Web,Web
"*http_content_type*",Web,Web
"*http_method*",Web,Web
"*http_referrer*",Web,Web
"*http_user_agent*",Web,Web
"*http_user_agent_length*",Web,Web
"*referer*",Web,Web
"*response_time*",Web,Web
"*site*",Web,Web
src,Web,Web
"*src_ip*",Web,Web
"*status*",Web,Web
"*uri_path*",Web,Web
"*uri_query*",Web,Web
"*url*",Web,Web
"*url_length*",Web,Web
"*user*",Web,Web
"*bytes*","Network_Traffic","All_Traffic"
"*bytes_in*","Network_Traffic","All_Traffic"
"*bytes_out*","Network_Traffic","All_Traffic"
"*channel*","Network_Traffic","All_Traffic"
dest,"Network_Traffic","All_Traffic"
"*dest_interface*","Network_Traffic","All_Traffic"
"*dest_mac*","Network_Traffic","All_Traffic"
"*dest_port*","Network_Traffic","All_Traffic"
"*dest_translated_ip*","Network_Traffic","All_Traffic"
"*dest_translated_port*","Network_Traffic","All_Traffic"
"*direction*","Network_Traffic","All_Traffic"
"*duration*","Network_Traffic","All_Traffic"
dvc,"Network_Traffic","All_Traffic"
"*flow_id*","Network_Traffic","All_Traffic"
"*icmp_code*","Network_Traffic","All_Traffic"
"*icmp_type*","Network_Traffic","All_Traffic"
"*mac*","Network_Traffic","All_Traffic"
"*packets*","Network_Traffic","All_Traffic"
"*packets_in*","Network_Traffic","All_Traffic"
"*packets_out*","Network_Traffic","All_Traffic"
"*protocol*","Network_Traffic","All_Traffic"
"*protocol_version*","Network_Traffic","All_Traffic"
"*response_time*","Network_Traffic","All_Traffic"
"*rule*","Network_Traffic","All_Traffic"
"*session_id*","Network_Traffic","All_Traffic"
src,"Network_Traffic","All_Traffic"
"*src_interface*","Network_Traffic","All_Traffic"
"*src_ip*","Network_Traffic","All_Traffic"
"*src_mac*","Network_Traffic","All_Traffic"
"*src_port*","Network_Traffic","All_Traffic"
"*src_translated_ip*","Network_Traffic","All_Traffic"
"*src_translated_port*","Network_Traffic","All_Traffic"
"*ssid*","Network_Traffic","All_Traffic"
"*tcp_flag*","Network_Traffic","All_Traffic"
"*transport*","Network_Traffic","All_Traffic"
"*tos*","Network_Traffic","All_Traffic"
"*ttl*","Network_Traffic","All_Traffic"
"*user*","Network_Traffic","All_Traffic"
"*vlan*","Network_Traffic","All_Traffic"
"*wifi*","Network_Traffic","All_Traffic"
dest,Authentication,Authentication
"*dest_nt_domain*",Authentication,Authentication
"*duration*",Authentication,Authentication
"*response_time*",Authentication,Authentication
src,Authentication,Authentication
"*src_nt_domain*",Authentication,Authentication
"*src_user*",Authentication,Authentication
"*user*",Authentication,Authentication
dest,Certificates,"All_Certificates"
"*dest_port*",Certificates,"All_Certificates"
"*duration*",Certificates,"All_Certificates"
"*response_time*",Certificates,"All_Certificates"
src,Certificates,"All_Certificates"
"*transport*",Certificates,"All_Certificates"
"*ssl_end_time*",Certificates,SSL
"*ssl_engine*",Certificates,SSL
"*ssl_hash*",Certificates,SSL
"*ssl_is_valid*",Certificates,SSL
"*ssl_issuer*",Certificates,SSL
"*ssl_issuer_common_name*",Certificates,SSL
"*ssl_issuer_email*",Certificates,SSL
"*ssl_issuer_locality*",Certificates,SSL
"*ssl_issuer_organization*",Certificates,SSL
"*ssl_issuer_state*",Certificates,SSL
"*ssl_issuer_street*",Certificates,SSL
"*ssl_issuer_unit*",Certificates,SSL
"*ssl_name*",Certificates,SSL
"*ssl_policies*",Certificates,SSL
"*ssl_publickey*",Certificates,SSL
"*ssl_publickey_algorithm*",Certificates,SSL
"*ssl_serial*",Certificates,SSL
"*ssl_session_id*",Certificates,SSL
"*ssl_signature_algorithm*",Certificates,SSL
"*ssl_start_time*",Certificates,SSL
"*ssl_subject*",Certificates,SSL
"*ssl_subject_common_name*",Certificates,SSL
"*ssl_subject_email*",Certificates,SSL
"*ssl_subject_locality*",Certificates,SSL
"*ssl_subject_state*",Certificates,SSL
"*ssl_subject_street*",Certificates,SSL
"*ssl_subject_unit*",Certificates,SSL
"*ssl_validity_window*",Certificates,SSL
"*ssl_version*",Certificates,SSL
"*delay*",Email,Email
dest,Email,Email
"*duration*",Email,Email
"*file_hash*",Email,Email
"*file_name*",Email,Email
"*file_size*",Email,Email
"*internal_message_id*",Email,Email
"*message_id*",Email,Email
"*message_info*",Email,Email
"*orig_dest*",Email,Email
"*orig_recipient*",Email,Email
"*orig_src*",Email,Email
"*process*",Email,Email
"*process_id*",Email,Email
"*protocol*",Email,Email
"*recipient*",Email,Email
"*recipient_count*",Email,Email
"*recipient_status*",Email,Email
"*response_time*",Email,Email
"*retries*",Email,Email
"*return_addr*",Email,Email
"*size*",Email,Email
src,Email,Email
"*src_user*",Email,Email
"*status_code*",Email,Email
"*subject*",Email,Email
"*url*",Email,Email
"*user*",Email,Email
"*xdelay*",Email,Email
"*xref*",Email,Email
"*filter_action*",Email,Filtering
"*filter_score*",Email,Filtering
"*signature*",Email,Filtering
"*signature_extra*",Email,Filtering
"*signature_id*",Email,Filtering
dest,"Intrusion Detection","IDS_Attacks"
dvc,"Intrusion Detection","IDS_Attacks"
"*ids_type*","Intrusion Detection","IDS_Attacks"
"*severity*","Intrusion Detection","IDS_Attacks"
"*signature*","Intrusion Detection","IDS_Attacks"
src,"Intrusion Detection","IDS_Attacks"
"*user*","Intrusion Detection","IDS_Attacks"
"*date*",Malware,"Malware_Attacks"
dest,Malware,"Malware_Attacks"
"*dest_nt_domain*",Malware,"Malware_Attacks"
"*dest_requires_av*",Malware,"Malware_Attacks"
"*file_hash*",Malware,"Malware_Attacks"
"*file_name*",Malware,"Malware_Attacks"
"*file_path*",Malware,"Malware_Attacks"
"*signature*",Malware,"Malware_Attacks"
src,Malware,"Malware_Attacks"
"*user*",Malware,"Malware_Attacks"
"*vendor_product*",Malware,"Malware_Attacks"
dest,Malware,"Malware_Operations"
"*dest_nt_domain*",Malware,"Malware_Operations"
"*dest_requires_av*",Malware,"Malware_Operations"
"*product_version*",Malware,"Malware_Operations"
"*signature_version*",Malware,"Malware_Operations"
"*vendor_product*",Malware,"Malware_Operations"
dest,Performance,"All_Performance"
"*dest_should_timesync*",Performance,"All_Performance"
"*hypervisor_id*",Performance,"All_Performance"
"*resource_type*",Performance,"All_Performance"
"*cpu_load_mhz*",Performance,CPU
"*cpu_load_percent*",Performance,CPU
"*cpu_time*",Performance,CPU
"*cpu_user_percent*",Performance,CPU
"*fan_speed*",Performance,Facilities
"*power*",Performance,Facilities
"*temperature*",Performance,Facilities
"*mem*",Performance,Memory
"*mem_committed*",Performance,Memory
"*mem_free*",Performance,Memory
"*mem_used*",Performance,Memory
"*swap*",Performance,Memory
"*swap_free*",Performance,Memory
"*swap_used*",Performance,Memory
"*array*",Performance,Storage
"*blocksize*",Performance,Storage
"*cluster*",Performance,Storage
"*fd_max*",Performance,Storage
"*fd_used*",Performance,Storage
"*latency*",Performance,Storage
"*mount*",Performance,Storage
"*parent*",Performance,Storage
"*read_blocks*",Performance,Storage
"*read_latency*",Performance,Storage
"*read_ops*",Performance,Storage
"*storage*",Performance,Storage
"*storage_free*",Performance,Storage
"*storage_free_percent*",Performance,Storage
"*storage_used*",Performance,Storage
"*storage_used_percent*",Performance,Storage
"*write_blocks*",Performance,Storage
"*write_latency*",Performance,Storage
"*write_ops*",Performance,Storage
"*thruput*",Performance,Network
"*thruput_max*",Performance,Network
"*signature*",Performance,OS
"*uptime*",Performance,Uptime

View solution in original post

Runals
Motivator

This solution involves the use of the Data Curator app. If are downloading the app for the first time to look into this solution make sure you run the ‘build sourcetype_fields csv’ saved search after installation.

A more complete writeup of this methodology can be found here.

I ended up manually creating a list of fields associated with most of the data models and objects and put that into Splunk as a lookup with wildcards. The following query bounces the sourcetype_fields lookup from the Data Curator app to the data model field list.

| inputlookup sourcetype_fields.csv | eval field = lower(field) | lookup dm_fields field as field | search model!=none | stats dc(field) as fields values(field) as field_list by sourcetype model object | where fields > 1 | sort -fields | stats max(fields) as maxFieldMatch  list(model) as Model list(object) as Object list(fields) as fieldMatch by sourcetype | sort -maxFieldMatch

The results of the search in my environment are what you might call directionally correct moreso than exact =). There are quite a number of good, actionable results but much of the depends on what fields names you have defined already. One thing I did pre CIM years ago was settle on src_ip and dest_ip for IP address related fields so we already had that baked into many of our sourcetypes. If you've used source_ip or destination_ip you might want to look into adjusting the lookup. I also saw a fair bit of matches for both Web and Network Traffic for the same sourcetype as there is fair bit of field naming overlap. Not to be out done the Windows Security logs had 5 matches lol: Web, Email (filtering & email objects), Network Traffic, and Authentication.

If you have any feedback to the list below or the methodology I'm all ears!

Transforms

[dm_fields]
filename = data_models.csv
match_type = WILDCARD(field)
max_matches = 1
min_matches = 1

Lookup

field,model,object
"*bytes*",Web,Web
"*bytes_in*",Web,Web
"*bytes_out*",Web,Web
"*cached*",Web,Web
"*cookie*",Web,Web
dest,Web,Web
"*duration*",Web,Web
"*http_content_type*",Web,Web
"*http_method*",Web,Web
"*http_referrer*",Web,Web
"*http_user_agent*",Web,Web
"*http_user_agent_length*",Web,Web
"*referer*",Web,Web
"*response_time*",Web,Web
"*site*",Web,Web
src,Web,Web
"*src_ip*",Web,Web
"*status*",Web,Web
"*uri_path*",Web,Web
"*uri_query*",Web,Web
"*url*",Web,Web
"*url_length*",Web,Web
"*user*",Web,Web
"*bytes*","Network_Traffic","All_Traffic"
"*bytes_in*","Network_Traffic","All_Traffic"
"*bytes_out*","Network_Traffic","All_Traffic"
"*channel*","Network_Traffic","All_Traffic"
dest,"Network_Traffic","All_Traffic"
"*dest_interface*","Network_Traffic","All_Traffic"
"*dest_mac*","Network_Traffic","All_Traffic"
"*dest_port*","Network_Traffic","All_Traffic"
"*dest_translated_ip*","Network_Traffic","All_Traffic"
"*dest_translated_port*","Network_Traffic","All_Traffic"
"*direction*","Network_Traffic","All_Traffic"
"*duration*","Network_Traffic","All_Traffic"
dvc,"Network_Traffic","All_Traffic"
"*flow_id*","Network_Traffic","All_Traffic"
"*icmp_code*","Network_Traffic","All_Traffic"
"*icmp_type*","Network_Traffic","All_Traffic"
"*mac*","Network_Traffic","All_Traffic"
"*packets*","Network_Traffic","All_Traffic"
"*packets_in*","Network_Traffic","All_Traffic"
"*packets_out*","Network_Traffic","All_Traffic"
"*protocol*","Network_Traffic","All_Traffic"
"*protocol_version*","Network_Traffic","All_Traffic"
"*response_time*","Network_Traffic","All_Traffic"
"*rule*","Network_Traffic","All_Traffic"
"*session_id*","Network_Traffic","All_Traffic"
src,"Network_Traffic","All_Traffic"
"*src_interface*","Network_Traffic","All_Traffic"
"*src_ip*","Network_Traffic","All_Traffic"
"*src_mac*","Network_Traffic","All_Traffic"
"*src_port*","Network_Traffic","All_Traffic"
"*src_translated_ip*","Network_Traffic","All_Traffic"
"*src_translated_port*","Network_Traffic","All_Traffic"
"*ssid*","Network_Traffic","All_Traffic"
"*tcp_flag*","Network_Traffic","All_Traffic"
"*transport*","Network_Traffic","All_Traffic"
"*tos*","Network_Traffic","All_Traffic"
"*ttl*","Network_Traffic","All_Traffic"
"*user*","Network_Traffic","All_Traffic"
"*vlan*","Network_Traffic","All_Traffic"
"*wifi*","Network_Traffic","All_Traffic"
dest,Authentication,Authentication
"*dest_nt_domain*",Authentication,Authentication
"*duration*",Authentication,Authentication
"*response_time*",Authentication,Authentication
src,Authentication,Authentication
"*src_nt_domain*",Authentication,Authentication
"*src_user*",Authentication,Authentication
"*user*",Authentication,Authentication
dest,Certificates,"All_Certificates"
"*dest_port*",Certificates,"All_Certificates"
"*duration*",Certificates,"All_Certificates"
"*response_time*",Certificates,"All_Certificates"
src,Certificates,"All_Certificates"
"*transport*",Certificates,"All_Certificates"
"*ssl_end_time*",Certificates,SSL
"*ssl_engine*",Certificates,SSL
"*ssl_hash*",Certificates,SSL
"*ssl_is_valid*",Certificates,SSL
"*ssl_issuer*",Certificates,SSL
"*ssl_issuer_common_name*",Certificates,SSL
"*ssl_issuer_email*",Certificates,SSL
"*ssl_issuer_locality*",Certificates,SSL
"*ssl_issuer_organization*",Certificates,SSL
"*ssl_issuer_state*",Certificates,SSL
"*ssl_issuer_street*",Certificates,SSL
"*ssl_issuer_unit*",Certificates,SSL
"*ssl_name*",Certificates,SSL
"*ssl_policies*",Certificates,SSL
"*ssl_publickey*",Certificates,SSL
"*ssl_publickey_algorithm*",Certificates,SSL
"*ssl_serial*",Certificates,SSL
"*ssl_session_id*",Certificates,SSL
"*ssl_signature_algorithm*",Certificates,SSL
"*ssl_start_time*",Certificates,SSL
"*ssl_subject*",Certificates,SSL
"*ssl_subject_common_name*",Certificates,SSL
"*ssl_subject_email*",Certificates,SSL
"*ssl_subject_locality*",Certificates,SSL
"*ssl_subject_state*",Certificates,SSL
"*ssl_subject_street*",Certificates,SSL
"*ssl_subject_unit*",Certificates,SSL
"*ssl_validity_window*",Certificates,SSL
"*ssl_version*",Certificates,SSL
"*delay*",Email,Email
dest,Email,Email
"*duration*",Email,Email
"*file_hash*",Email,Email
"*file_name*",Email,Email
"*file_size*",Email,Email
"*internal_message_id*",Email,Email
"*message_id*",Email,Email
"*message_info*",Email,Email
"*orig_dest*",Email,Email
"*orig_recipient*",Email,Email
"*orig_src*",Email,Email
"*process*",Email,Email
"*process_id*",Email,Email
"*protocol*",Email,Email
"*recipient*",Email,Email
"*recipient_count*",Email,Email
"*recipient_status*",Email,Email
"*response_time*",Email,Email
"*retries*",Email,Email
"*return_addr*",Email,Email
"*size*",Email,Email
src,Email,Email
"*src_user*",Email,Email
"*status_code*",Email,Email
"*subject*",Email,Email
"*url*",Email,Email
"*user*",Email,Email
"*xdelay*",Email,Email
"*xref*",Email,Email
"*filter_action*",Email,Filtering
"*filter_score*",Email,Filtering
"*signature*",Email,Filtering
"*signature_extra*",Email,Filtering
"*signature_id*",Email,Filtering
dest,"Intrusion Detection","IDS_Attacks"
dvc,"Intrusion Detection","IDS_Attacks"
"*ids_type*","Intrusion Detection","IDS_Attacks"
"*severity*","Intrusion Detection","IDS_Attacks"
"*signature*","Intrusion Detection","IDS_Attacks"
src,"Intrusion Detection","IDS_Attacks"
"*user*","Intrusion Detection","IDS_Attacks"
"*date*",Malware,"Malware_Attacks"
dest,Malware,"Malware_Attacks"
"*dest_nt_domain*",Malware,"Malware_Attacks"
"*dest_requires_av*",Malware,"Malware_Attacks"
"*file_hash*",Malware,"Malware_Attacks"
"*file_name*",Malware,"Malware_Attacks"
"*file_path*",Malware,"Malware_Attacks"
"*signature*",Malware,"Malware_Attacks"
src,Malware,"Malware_Attacks"
"*user*",Malware,"Malware_Attacks"
"*vendor_product*",Malware,"Malware_Attacks"
dest,Malware,"Malware_Operations"
"*dest_nt_domain*",Malware,"Malware_Operations"
"*dest_requires_av*",Malware,"Malware_Operations"
"*product_version*",Malware,"Malware_Operations"
"*signature_version*",Malware,"Malware_Operations"
"*vendor_product*",Malware,"Malware_Operations"
dest,Performance,"All_Performance"
"*dest_should_timesync*",Performance,"All_Performance"
"*hypervisor_id*",Performance,"All_Performance"
"*resource_type*",Performance,"All_Performance"
"*cpu_load_mhz*",Performance,CPU
"*cpu_load_percent*",Performance,CPU
"*cpu_time*",Performance,CPU
"*cpu_user_percent*",Performance,CPU
"*fan_speed*",Performance,Facilities
"*power*",Performance,Facilities
"*temperature*",Performance,Facilities
"*mem*",Performance,Memory
"*mem_committed*",Performance,Memory
"*mem_free*",Performance,Memory
"*mem_used*",Performance,Memory
"*swap*",Performance,Memory
"*swap_free*",Performance,Memory
"*swap_used*",Performance,Memory
"*array*",Performance,Storage
"*blocksize*",Performance,Storage
"*cluster*",Performance,Storage
"*fd_max*",Performance,Storage
"*fd_used*",Performance,Storage
"*latency*",Performance,Storage
"*mount*",Performance,Storage
"*parent*",Performance,Storage
"*read_blocks*",Performance,Storage
"*read_latency*",Performance,Storage
"*read_ops*",Performance,Storage
"*storage*",Performance,Storage
"*storage_free*",Performance,Storage
"*storage_free_percent*",Performance,Storage
"*storage_used*",Performance,Storage
"*storage_used_percent*",Performance,Storage
"*write_blocks*",Performance,Storage
"*write_latency*",Performance,Storage
"*write_ops*",Performance,Storage
"*thruput*",Performance,Network
"*thruput_max*",Performance,Network
"*signature*",Performance,OS
"*uptime*",Performance,Uptime

pparkerntx99
Explorer

Extremely Helpful,
Why can't all answers be this helpful and direct?

Runals
Motivator

lol I posted both the question and answer which helps. Glad you've found it helpful. Sadly I look at this a year later and realize how little I've been able to do to take more action upon this work /sigh.

0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...