I'm at my wits end here, everything seems to indicate what I'm doing should work, yet it's not.
I have Azure firewall logs feeding in through a storage account using the Microsoft Cloud Services app. These come in as standard JSON, which is being extracted fine by Splunk. There is a nested field in the JSON, "properties.msg", that has the actual firewall log message including source/destination information, IPs/ports, whether it was allowed/denied, and what firewall rule was referenced.
For reference, this thread discusses a nearly similar case/problem -https://community.splunk.com/t5/Splunk-Search/Azure-Firewall-Log-Field-Extraction-Help/m-p/411148
The added wrinkle I have is that I am trying to get the fields extracted to work with CIM data models, not just get the extractions as results from a search. This honestly seemed easy enough, but for some reason none of my field extractions are working.
Here are some facts/things I have tried
I honestly don't get how I can see the regex working in the Field Extractor, hit 'Save', see it saved in the configurations, but not extract fields.
EDIT:
Sample _raw log (more in updated link posted above)
{ "category": "AzureFirewallApplicationRule", "time": "2021-05-04T15:41:59.8967610Z", "resourceId": "/SUBSCRIPTIONS/REDACTED/RESOURCEGROUPS/REDACTED/PROVIDERS/MICROSOFT.NETWORK/AZUREFIREWALLS/SOMEFW", "operationName": "AzureFirewallApplicationRuleLog", "properties": {"msg":"HTTPS request from 192.168.0.1:8888 to subdomain.x99.blob.storage.azure.net:443. Action: Allow. Rule Collection: AllowOutbound. Rule: AllowOutbound-AA-AA-A"}}
{ "category": "AzureFirewallApplicationRule", "time": "2021-05-04T15:41:58.6369780Z", "resourceId": "/SUBSCRIPTIONS/REDACTED/RESOURCEGROUPS/REDACTED/PROVIDERS/MICROSOFT.NETWORK/AZUREFIREWALLS/SOMEFW", "operationName": "AzureFirewallApplicationRuleLog", "properties": {"msg":"HTTPS request from 192.168.0.1:8888 to subdomain.x99.blob.storage.azure.net:443. Action: Allow. Rule Collection: AllowOutbound. Rule: AllowOutbound-AA-AA-A"}}
{ "category": "AzureFirewallNetworkRule", "time": "2021-05-07T15:05:59.8277330Z", "resourceId": "/SUBSCRIPTIONS/REDACTED/RESOURCEGROUPS/REDACTED/PROVIDERS/MICROSOFT.NETWORK/AZUREFIREWALLS/SOMEFW", "operationName": "AzureFirewallNetworkRuleLog", "properties": {"msg":"TCP request from 192.168.0.1:8888 to 8.8.8.8:8888. Action: Deny. "}}
Regex
\"(?<protocol>\w+)\s[rR]equest\D+(?<src>[^\:]+)\:(?<src_port>\d+) to (?<dest>[^\:]+)\:((?<dest_port>\d+))?\.\sAction\: (?<action>\w+)\.(?: Rule Collection\: (?<cat>\w+)\. Rule\: (?<rule>[^\"]+))?
Hi,
Is it possible you to provide some sample data (Please redact any sensitive data) and also provide regex which you are using.
I've added some samples and my regex to the original post, and updated the link to point to https://community.splunk.com/t5/Splunk-Search/Azure-Firewall-Log-Field-Extraction-Help/m-p/411148 which also has some additional examples if needed.
For me it is working with Field Extraction and Field transformation. Main things you need to keep in mind that your sourcetype must have KV_MODE = json otherwise below configuration will not work.
Used your regex but removed starting \"
Regex:
(?<protocol>\w+)\s[rR]equest\D+(?<src>[^\:]+)\:(?<src_port>\d+) to (?<dest>[^\:]+)\:((?<dest_port>\d+))?\.\sAction\: (?<action>\w+)\.(?: Rule Collection\: (?<cat>\w+)\. Rule\: (?<rule>[^\"]+))?
Thanks for your response.
I don't think I can confirm the KV_MODE of the sourcetype easily in Splunk Cloud, but I'll look. Definitely seems like it's doing automatic KV extraction, but that could be misleading.
However, according to this - https://docs.splunk.com/Documentation/SplunkCloud/latest/Knowledge/Searchtimeoperationssequence
The KV_MODE would apply after the both inline and transform based extractions, doesn't it? I had tried specifying properties.msg as the SOURCE_KEY before, and when it didn't work for me I assumed it was due to that, and tried just using _raw (which is why the "s were there to help the regex) to no avail.
Used API to set the KV Mode to JSON, put in details exactly as specified and appears to be working in other environments, still not working.
Seeing same issue with another non-json source, where a single field extraction shows as extracted in the preview of the Field Extractor, but does not get extracted after saving that, regardless of sharing status.
I am confused now, JSON events which was not extracting fields at search time is working now ??
I am confused as well, but apparently this is resolved.
The issue I was seeing with JSON events originally described also was with some other, non-JSON events.
I opened a ticket with support. They, like you, created one of the extractions without issue on the AdHoc Splunk Cloud SH. I was able to create another on the AdHoc SH, but to line up with the data models I had been trying to use the ES Search Head.
After that, the previous extractions I had created on the ES Search Head seemed to start working. Not sure if something changed, or if I had just been impatient previously and not letting the extraction enough time to apply.
So things are resolved now, and I don't understand why. But things are working as intended.