Splunk Search

Missing Fields in data set

Abass42
Communicator

I have some Netskope data. Searching it goes something like this:

index=testing sourcetype="netskope:application" dlp_rule="AB C*"
| lookup NetSkope_test.csv dlp_rule OUTPUT C_Label as "Label Name"
| eval Date=strftime(_time, "%Y-%m-%d"), Time=strftime(_time, "%H:%M:%S")
| rename user as User dstip as "Destination IP" dlp_file as File url as URL
| table Date Time User URL File "Destination IP"  User "Label Name"

 

I am tracking social security numbers and how many times one leaves the firm. I even mapped the specific dlp_rule values found to values like C1, C2, C3...

When I had added this query, I had to update the other panels accordingly to track the total number of SSN leaving firm through various methods. On all of them, I had the above filter:

index=testing sourcetype="netskope:application" dlp_rule="AB C*"

And I am pretty sure I had results. Pretty much, for the dlp_rule value, I had strings like AB C*, and I had 5 distinct values I was mapping against. 

Looking at the dataset now, a few months later, I dont see any values matching the above criteria, AB C*. I have 4 values, and the dlp_rule that has a null value appears over 38 million times.

Abass42_0-1744820613480.png

I think the null value is supposed to be the AB C*. I dont have any screen shots proving this though

My question is, after discussing this with the client, what could have happened? When searching for all time, the above SS is what I get. If I understand how splunk works even vaguely, I dont believe Splunk has the power to go in and edit old ingested logs, in this case, go through and remove a specific value from all old logs of a specific data source. That doesnt make any logical sense. Both the client and I remember seeing the values specific above. They are going to contact netskope to see what happened, but as far as i know, I have not changed anything that is related to this data source. 

Can old data change in Splunk? Can a new props.conf or transforms apply to old data? 

 

Thank you for any guidance. 

Labels (2)
Tags (2)
0 Karma

livehybrid
SplunkTrust
SplunkTrust

Hi @Abass42 

You're right in that editing historic data in Splunk isnt really possible. (You can delete data if you have the can_delete capability though). 

What I'm wondering is that one of 2 things may have happened.

1) The data has changed

2) Your field extractions have changed.

They ultimately boil down to the same question - How does the "dlp_rule" field get defined? Is this an actual value in the _raw data (such as [time] - component=something dlp_rule=ABC user=Bob host=BobsLaptop ) OR is dlp_rule actually determined/eval/extracted from other data in the event such as a status code, or maybe a regular expression?

If this is the case then the questions become, has the data format changed slightly? This could be something simple as an additional space or field in the raw data which has stopped the field extraction working, or, has the field extraction been changed at all?

If you're able to provide a sample event then it might help - redacted of course.

Another thing which you could do if you are unsure on what fields are extracted etc is run a btool on your SearchHead (if you are running onprem) such as:

/opt/splunk/bin/splunk cmd btool props list netskope:application

 Are you able to look at a raw historical event where you go a match you expected and compare to a recent event to see if there are any differences in the event? 

🌟 Did this answer help you? If so, please consider:

  • Adding karma to show it was useful
  • Marking it as the solution if it resolved your issue
  • Commenting if you need any clarification

Your feedback encourages the volunteers in this community to continue contributing

0 Karma

Abass42
Communicator

Hey, thanks for your answer. After i posted this, i went to investigate the source of the data and any props or transforms set up for it. 

I ran the following from our forwarder, the server that has the netskope TA app installed on it.  

./splunk btool props list --debug | grep "netskope:application"

 

Abass42_0-1744837049237.png

Abass42_3-1744838417927.png

I dont have any transforms with that tag. 

Here is the output of the default netskope application inputs:

[source::...netskope_file_hash_modalert.log*]
SHOULD_LINEMERGE = true
sourcetype = tanetskopeappforsplunk:log
TZ = UTC

[source::...netskope_url_modalert.log*]
SHOULD_LINEMERGE = true
sourcetype = tanetskopeappforsplunk:log
TZ = UTC

[source::...ta-netskopeappforsplunk*.log*]
SHOULD_LINEMERGE = true
sourcetype = tanetskopeappforsplunk:log
TZ = UTC

[source::...ta_netskopeappforsplunk*.log*]
SHOULD_LINEMERGE = true
sourcetype = tanetskopeappforsplunk:log
TZ = UTC

[netskope:event:v2]
SHOULD_LINEMERGE = 0
category = Splunk App Add-on Builder
pulldown_type = 1

[netskope:alert:v2]
SHOULD_LINEMERGE = 0
category = Splunk App Add-on Builder
pulldown_type = 1

[netskope:web_transaction]
INDEXED_EXTRACTIONS = W3C
TIME_FORMAT = %Y-%m-%d %H:%M:%S
TZ = Etc/GMT
SHOULD_LINEMERGE = 0
TRUNCATE = 999999
EXTRACT-from_source = .*[\\\/](?<input_name>.*)_(?<bucket_name>\d{8})_(?<bucket_file_name>.*) in source
EVAL-vendor_product = "Netskope"
FIELDALIAS-app = x_cs_app AS app
FIELDALIAS-timestamp = _time as timestamp
FIELDALIAS-bytes_in = cs_bytes AS bytes_in
FIELDALIAS-bytes_out = sc_bytes AS bytes_out
FIELDALIAS-category = x_category AS category
FIELDALIAS-dest = s_ip AS dest
EVAL-http_content_type = coalesce(cs_content_type, sc_content_type)
FIELDALIAS-http_method = cs_method AS http_method
FIELDALIAS-http_referrer = cs_referer AS http_referrer
FIELDALIAS-http_user_agent = cs_user_agent AS http_user_agent
FIELDALIAS-response_time = time_taken AS response_time
FIELDALIAS-src=c_ip AS src
FIELDALIAS-status = sc_status AS status
FIELDALIAS-uri_path = cs_uri AS uri_path
FIELDALIAS-uri_query = cs_uri_query AS uri_query
FIELDALIAS-user = cs_username AS user
EVAL-fullurl = cs_uri_scheme . "://" . cs_dns . cs_uri . if(isnull(cs_uri_query), "", "?") . coalesce(cs_uri_query,"")
EVAL-x_c_browser=if(isnull(x_c_browser),"N/A",x_c_browser)
EVAL-x_c_device=if(isnull(x_c_device),"N/A",x_c_device)
FIELDALIAS-dest_port = cs_uri_port AS dest_port
EVAL-url = cs_uri_scheme . "://" . cs_dns . cs_uri . if(isnull(cs_uri_query), "", "?") . coalesce(cs_uri_query,"")
FIELDALIAS-duration = time_taken AS duration
FIELDALIAS-http_referrer_domain = cs_referer AS http_referrer_domain
EVAL-site = replace(cs_host, "^([^\.]+).*", "\1")

[source::netskope_events_v2_connection]
KV_MODE = json
sourcetype = netskope:connection
TIME_PREFIX = "timestamp":
MAX_TIMESTAMP_LOOKAHEAD = 20
TIME_FORMAT = %s
SHOULD_LINEMERGE = false
TRUNCATE = 999999

[source::...*events_iterator_page*.csv]
INDEXED_EXTRACTIONS = CSV
sourcetype = netskope:connection
TIMESTAMP_FIELDS=timestamp
TIME_FORMAT = %s
SHOULD_LINEMERGE = false
TRUNCATE = 999999

[netskope:connection]
FIELDALIAS-src_ip = srcip AS src_ip
FIELDALIAS-src=srcip AS src
FIELDALIAS-dest_ip = dstip AS dest_ip
FIELDALIAS-dest = dstip AS dest
EVAL-dvc = coalesce(hostname, device)
EVAL-app_session_key = app_session_id . "::" . host
EVAL-vendor_product = "Netskope"
FIELDALIAS-page_duration = page_duration AS duration
FIELDALIAS-bytes = numbytes AS bytes
FIELDALIAS-in_bytes = client_bytes AS bytes_in
FIELDALIAS-category = appcategory AS category
FIELDALIAS-out_bytes = server_bytes AS bytes_out
FIELDALIAS-http_referrer = useragent AS http_user_agent
EVAL-http_user_agent_length = len(useragent)
FIELDALIAS-page = page AS url
FIELDALIAS-src_location = src_location AS src_zone
FIELDALIAS-dest_location = dst_location AS dest_zone
EVAL-url_length = len(page)
# from netskope:web
EVAL-action = if(isnotnull(action),action,"isolate")
FIELDALIAS-oc = object_type AS object_category
FIELDALIAS-fu = from_user AS src_user

[netskope:audit]
SHOULD_LINEMERGE = false
TIME_PREFIX = "timestamp":
MAX_TIMESTAMP_LOOKAHEAD = 20
TIME_FORMAT = %s
TRUNCATE = 999999
KV_MODE = json
EVAL-vendor_product = "Netskope"
# acl_modified, cleared, created, deleted, modified, read, stopped, updated
EVAL-action = case(match(audit_log_event,"create|Create"),"created", match(audit_log_event,"granted"), "acl_modified", match(audit_log_event, "ack|Ack"), "cleared", match(audit_log_event, "delete|Delete"), "deleted", match(audit_log_event,"edit|Edit|Add"),"modified",match(audit_log_event,"Push|push|Reorder|update|Update"),"updated",match(audit_log_event,"Disable|disable"), "stopped",1=1,"unknown")
EVAL-status = case(match(audit_log_event,"success|Success"),"success",match(audit_log_event,"fail|Fail"),"failure",1=1,"unknown")
FIELDALIAS-severity_id = severity_level AS severity_id
FIELDALIAS-data_type = supporting_data.data_type AS object
FIELDALIAS-date_type_attr = supporting_data.data_values{} AS object_attrs
FIELDALIAS-object_cat = category AS object_category
FIELDALIAS-result = audit_log_event AS result

[source::netskope_events_v2_application]
KV_MODE = json
TIME_PREFIX = "timestamp":
MAX_TIMESTAMP_LOOKAHEAD = 20
TIME_FORMAT = %s
sourcetype = netskope:application
SHOULD_LINEMERGE = false
TRUNCATE = 999999

[source::...*events_iterator_application*.csv]
INDEXED_EXTRACTIONS = CSV
sourcetype = netskope:application
TIMESTAMP_FIELDS=timestamp
TIME_FORMAT = %s
SHOULD_LINEMERGE = false
TRUNCATE = 999999

[netskope:application]
FIELDALIAS-src_ip = srcip AS src_ip
FIELDALIAS-src=srcip AS src
FIELDALIAS-dest_ip = dstip AS dest_ip
FIELDALIAS-dest = dstip AS dest
EVAL-dvc = coalesce(hostname, device)
FIELDALIAS-src_location = src_location AS src_zone
FIELDALIAS-dest_location = dst_location AS dest_zone
FIELDALIAS-signature = policy AS signature
EVAL-file_hash = coalesce(local_sha256, local_md5)
FIELDALIAS-file_name = filename AS file_name
EVAL-app_session_key = app_session_id . "::" . host
EVAL-vendor_product = "Netskope"
FIELDALIAS-oc = object_type AS object_category
FIELDALIAS-fu = from_user AS src_user

[source::netskope_events_v2_network]
KV_MODE = json
TIME_PREFIX = "timestamp":
MAX_TIMESTAMP_LOOKAHEAD = 20
TIME_FORMAT = %s
sourcetype = netskope:network
SHOULD_LINEMERGE = false
TRUNCATE = 999999

[source::...*events_iterator_network*.csv]
INDEXED_EXTRACTIONS = CSV
sourcetype = netskope:network
TIMESTAMP_FIELDS=timestamp
TIME_FORMAT = %s
SHOULD_LINEMERGE = false
TRUNCATE = 999999

[netskope:network]
FIELDALIAS-src_ip = srcip AS src_ip
FIELDALIAS-src=srcip AS src
FIELDALIAS-dest_ip = dstip AS dest_ip
FIELDALIAS-dest = dstip AS dest
EVAL-dvc = coalesce(hostname, device)
EVAL-vendor_product = "Netskope"
FIELDALIAS-bytes = numbytes AS bytes
FIELDALIAS-in_bytes = client_bytes AS bytes_in
FIELDALIAS-out_bytes = server_bytes AS bytes_out
FIELDALIAS-packets_in = client_packets AS packets_in
FIELDALIAS-packets_out = server_packets AS packets_out
FIELDALIAS-src_port = srcport AS src_port
FIELDALIAS-dest_port = dstport AS dest_port
FIELDALIAS-session_id = network_session_id AS session_id
FIELDALIAS-duration = session_duration AS duration

[netskope:incident]
SHOULD_LINEMERGE = false
TIME_PREFIX = "timestamp":
MAX_TIMESTAMP_LOOKAHEAD = 20
TIME_FORMAT = %s
TRUNCATE = 999999
KV_MODE = json
FIELDALIAS-signature_id = internal_id AS signature_id
FIELDALIAS-action = dlp_match_info{}.dlp_action AS action
FIELDALIAS-object_path = url AS object_path
FIELDALIAS-object_category = true_obj_category AS object_category
FIELDALIAS-signature = title AS signature
FIELDALIAS-src=src_location AS src
FIELDALIAS-src_user = from_user AS src_user
FIELDALIAS-dest = dst_location AS dest
# FIELDALIAS-user = to_user AS user
EVAL-user = coalesce(user, to_user)
EVAL-vendor_product = "Netskope"

[source::netskope_alerts_v2]
KV_MODE = json
TIME_PREFIX = "timestamp":
MAX_TIMESTAMP_LOOKAHEAD = 20
TIME_FORMAT = %s
sourcetype = netskope:alert
SHOULD_LINEMERGE = false
TRUNCATE = 999999

[source::...*alerts_iterator*.csv]
INDEXED_EXTRACTIONS = CSV
SHOULD_LINEMERGE = false
TIMESTAMP_FIELDS=timestamp
TIME_FORMAT = %s
sourcetype = netskope:alert
TRUNCATE = 999999

[netskope:alert]
EVAL-dvc = coalesce(hostname, device)
EVAL-vendor_product = "Netskope"
EVAL-severity_id = coalesce(severity_id, severity_level_id)
EVAL-severity = coalesce(severity_level, dlp_rule_severity, dlp_severity, mal_sev, malware_severity, severity, severity_level)
EVAL-object_path = if(file_path="NA", object, coalesce(file_path, object))
FIELDALIAS-id = internal_id AS id
FIELDALIAS-srcip = srcip AS src
FIELDALIAS-dstip = dstip AS dest
EVAL-file_hash = coalesce(local_sha256, local_md5)
FIELDALIAS-signature = alert_name AS signature
FIELDALIAS-oc = object_type AS object_category
FIELDALIAS-fu = from_user AS src_user
FIELDALIAS-src_location = src_location AS src_zone
FIELDALIAS-dest_location = dst_location AS dest_zone
FIELDALIAS-file_name = filename AS file_name

[netskope:infrastructure]
SHOULD_LINEMERGE = false
TIME_PREFIX = "timestamp":
MAX_TIMESTAMP_LOOKAHEAD = 20
TIME_FORMAT = %s
TRUNCATE = 999999
KV_MODE = json
FIELDALIAS-device = device_name AS device
EVAL-app = "Netskope"
EVAL-vendor_product = "Netskope"

[netskope:endpoint]
SHOULD_LINEMERGE = false
TIME_PREFIX = "timestamp":
MAX_TIMESTAMP_LOOKAHEAD = 20
TIME_FORMAT = %s
TRUNCATE = 999999
KV_MODE = json
EVAL-vendor_product = "Netskope"

[netskope:clients]
KV_MODE = json
FIELDALIAS-make = attributes.host_info.device_make AS make
FIELDALIAS-model = attributes.host_info.device_model AS model
FIELDALIAS-os = attributes.host_info.os AS os
FIELDALIAS-ver = attributes.host_info.os_version AS version
FIELDALIAS-name = attributes.host_info.hostname AS dest
FIELDALIAS-user = attributes.users{}.username AS user
EVAL-vendor_product = "Netskope"
SHOULD_LINEMERGE = false
TIME_PREFIX = "timestamp":
MAX_TIMESTAMP_LOOKAHEAD = 35
TIME_FORMAT = %s
TRUNCATE = 999999

[netskope:api]
KV_MODE = json
EVAL-vendor_product = "Netskope"

[netskope:alertaction:file_hash]
FIELDALIAS-action_status = status AS action_status
FIELDALIAS-action_name = orig_action_name AS action_name

[netskope:alertaction:url]
FIELDALIAS-action_status = status AS action_status
FIELDALIAS-action_name = orig_action_name AS action_name

# For proper ingestion of Alert action events used in Splunk ES App
[source::...stash_common_action_model]
sourcetype=stash_common_action_model

[stash_common_action_model]
TRUNCATE                = 0
# only look for ***SPLUNK*** on the first line
HEADER_MODE             = firstline
# we can summary index past data, but rarely future data
MAX_DAYS_HENCE          = 2
MAX_DAYS_AGO            = 10000
# 5 years difference between two events
MAX_DIFF_SECS_AGO       = 155520000
MAX_DIFF_SECS_HENCE     = 155520000
TIME_PREFIX             = (?m)^\*{3}Common\sAction\sModel\*{3}.*$
MAX_TIMESTAMP_LOOKAHEAD = 25
LEARN_MODEL             = false
# break .stash_new custom format into events
SHOULD_LINEMERGE        = false
BREAK_ONLY_BEFORE_DATE  = false
LINE_BREAKER            = (\r?\n==##~~##~~  1E8N3D4E6V5E7N2T9 ~~##~~##==\r?\n)

TRANSFORMS-0parse_cam_header    = orig_action_name_for_stash_cam,orig_sid_for_stash_cam,orig_rid_for_stash_cam,sourcetype_for_stash_cam
TRANSFORMS-1sinkhole_cam_header = sinkhole_cam_header

 

 

Looking and running your suggested command (Good command btw), i get the following output:

Abass42_1-1744837420751.png

I don't see any evidence of us modifying or creating a dlp_rule value. I had specifically mapped the dlp_rule to these values below:

Abass42_2-1744837579441.png

These are the values I was seeing. I was using this mapping and values in every other query as well, so i must have seen them. 

This is the default netskope app. I also looked at any possible sourcetypes or transforms via the gui, and I didn't see any. I am working on this data with a coworker that has insight into the Netskope portal, and he said that the dlp_role field is blank there as well. If the data had changed, the old data shouldn't have changed. I haven't updated the netskope app. 

 

There are too many fields to paste in here for the logs themselves, but here are the fields we are looking at:

   dlp_fail_reason:
   dlp_file:
   dlp_incident_id: 0
   dlp_is_unique_count:
   dlp_mail_parent_id:
   dlp_parent_id: 0
   dlp_profile:
   dlp_rule:
   dlp_rule_count: 0
   dlp_rule_severity:
   dlp_scan_failed:
   dlp_unique_count: 0
   dst_country: US
   dst_geoip_src: 0
   dst_latitude: 7.40594
   dst_location: Mow
   dst_longitude: -1.1551
   dst_region: C
   dst_timezone: America/
   dst_zipcode: N/A
   dsthost:
   dstip: 1.5.5.5
   dstport: 455

 

With this specific dashboard and use case, I am searching for all time. And the field in general is blank. We only get 3 dlp_rule values, and the rest, 99% are blank. 

Not sure how to track down if the data set changed due to me searching for all time right now. 

 

Thanks for any guidance 

Tags (2)
0 Karma
Career Survey
First 500 qualified respondents will receive a $20 gift card! Tell us about your professional Splunk journey.
Get Updates on the Splunk Community!

Tech Talk Recap | Mastering Threat Hunting

Mastering Threat HuntingDive into the world of threat hunting, exploring the key differences between ...

Observability for AI Applications: Troubleshooting Latency

If you’re working with proprietary company data, you’re probably going to have a locally hosted LLM or many ...

Splunk AI Assistant for SPL vs. ChatGPT: Which One is Better?

In the age of AI, every tool promises to make our lives easier. From summarizing content to writing code, ...