Splunk Search

Azure Firewall Log - Field Extraction Help

Explorer

Has anyone figured out how to extract the useful fields from Azure Firewall Logs? We are logging our Azure Firewall logs to a storage account and Splunk is pulling those with the Splunk Microsoft Cloud Services app. The app doesn't appear to include the field extractions for these types of logs. I am not well versed in regex, so I tried using the Splunk field extractor in the GUI, but I ran into issues. I mainly need the protocol, srcip, srcport, destip, destport, action. I was able to get this working for TCP and UDP traffic, but the problems started with ICMP traffic. My field extractions wouldn't match the ICMP traffic. I think this was due to no port field in those. Then I noticed a few other logs that would not be the same as the TCP/UDP logs either. Here are some sample scrubbed logs in case someone is willing to help here. Or if anyone has already solved this with some app that I am not aware of, I would appreciate some help! Thanks in advance.

Here are all the sample logs examples I am seeing coming from the Azure Firewalls:

{ "category": "AzureFirewallNetworkRule", "time": "2019-07-30T17:43:59.9812590Z", "resourceId": "/SUBSCRIPTIONS/XXXX1111-XX11-XX11-XX11-XXXXXX111111/RESOURCEGROUPS/NET-XXX1-XXX1-RG/PROVIDERS/MICROSOFT.NETWORK/AZUREFIREWALLS/NET-XXX1-XXX1-FW01", "operationName": "AzureFirewallNetworkRuleLog", "properties": {"msg":"TCP request from XXX.X.178.18:42132 to XXX.X.242.12:51113. Action: Allow"}}

{ "category": "AzureFirewallNetworkRule", "time": "2019-07-30T17:44:59.4538600Z", "resourceId": "/SUBSCRIPTIONS/XXXX1111-XX11-XX11-XX11-XXXXXX111111/RESOURCEGROUPS/NET-XXX1-XXX1-RG/PROVIDERS/MICROSOFT.NETWORK/AZUREFIREWALLS/NET-XXX1-XXX1-FW02", "operationName": "AzureFirewallNetworkRuleLog", "properties": {"msg":"ICMP request from XXX.X.10.1 to XXX.X.69.5. Action: Allow"}}

{ "category": "AzureFirewallNetworkRule", "time": "2019-07-30T16:13:54.9901410Z", "resourceId": "/SUBSCRIPTIONS/XXXX1111-XX11-XX11-XX11-XXXXXX111111/RESOURCEGROUPS/DMZ-XXX1-XXX1-RG/PROVIDERS/MICROSOFT.NETWORK/AZUREFIREWALLS/DMZ-XXX1-XXX1-FW01", "operationName": "AzureFirewallNatRuleLog", "properties": {"msg":"TCP request from XXX.X.131.34:3318 to XXX.X.224.170:3299 was DNAT'ed to XXX.X.80.5:3299"}}

{ "category": "AzureFirewallNetworkRule", "time": "2019-07-30T16:39:45.5354460Z", "resourceId": "/SUBSCRIPTIONS/XXXX1111-XX11-XX11-XX11-XXXXXX111111/RESOURCEGROUPS/DMZ-XXX1-XXX1-RG/PROVIDERS/MICROSOFT.NETWORK/AZUREFIREWALLS/DMZ-XXX1-XXX1-FW01", "operationName": "AzureFirewallThreatIntelLog", "properties": {"msg":"TCP request from XXX.X.231.199:33348 to XXX.X.224.170:443. Action: Alert. ThreatIntel: Port Scan"}}

{ "category": "AzureFirewallApplicationRule", "time": "2019-07-30T17:19:00.1880780Z", "resourceId": "/SUBSCRIPTIONS/XXXX1111-XX11-XX11-XX11-XXXXXX111111/RESOURCEGROUPS/DMZ-XXX1-XXX1-RG/PROVIDERS/MICROSOFT.NETWORK/AZUREFIREWALLS/DMZ-XXX1-XXX1-FW01", "operationName": "AzureFirewallApplicationRuleLog", "properties": {"msg":"HTTPS request from XXX.X.110.8:65486 to abcd.efghijk.com:443. Action: Deny. No rule matched. Proceeding with default action"}}

{ "category": "AzureFirewallApplicationRule", "time": "2019-07-30T12:00:04.0868100Z", "resourceId": "/SUBSCRIPTIONS/XXXX1111-XX11-XX11-XX11-XXXXXX111111/RESOURCEGROUPS/DMZ-XXX1-XXX1-RG/PROVIDERS/MICROSOFT.NETWORK/AZUREFIREWALLS/DMZ-XXX1-XXX1-FW01", "operationName": "AzureFirewallApplicationRuleLog", "properties": {"msg":"HTTPS request from XXX.X.66.64:30476. Action: Deny. Reason: SNI TLS extension was missing."}}

0 Karma

Builder

Hi @jordanmedved ,

Here is a regex you can use:


| rex "\"properties\": {\"msg\":\"(?\S+) request from (?\d{1,3}.\d{1,3}.\d{1,3}.\d{1,3})(:(?\d+))? to (?\d{1,3}.\d{1,3}.\d{1,3}.\d{1,3})(:(?\d+))?. Action: (?[^\"]+)\""

You could also try specifying the field as well:
| rex field=properties.msg "(?\S+) request from (?\d{1,3}.\d{1,3}.\d{1,3}.\d{1,3})(:(?\d+))? to (?\d{1,3}.\d{1,3}.\d{1,3}.\d{1,3})(:(?\d+))?. Action: (?[^\"]+)"

Let me know if that works for you.

Explorer

I will review this and let you know how it goes. Thanks! Splunk has the best community and really makes this product top notch!

0 Karma

Motivator

Oh man. @rbechtold, that is some fancy stuff right there, but as you say there is a much easier way.

I input a few of your events into this website to confirm that this is standardized JSON format. This Microsoft doc also confirms that Azure FW logs are stored in JSON format. Since all of that work is done for you, all you need to do is either use the built-in _json sourcetype, or clone that sourcetype to meet your own needs (for _time extraction, for example). Better yet, there's probably an existing Splunk app with all of this set up already (https://splunkbase.splunk.com/app/3534/ or https://splunkbase.splunk.com/app/3110/?). I don't have experience with Azure monitoring so you'd have to check those out yourself.

Once you get that set up, all fields will be automatically extracted in a hierarchy. As an example, in your first event, the msg field would actually be properties.msg since it's a child node to properties. If you want to check this out, save your example events to a file (test.json), then go to Settings > Add Data > Upload to configure the sourcetype using the UI using either the built-in _json or a custom clone of it.

Good luck, and happy Splunking. If you get stuck, feel free to comment with additional specifics.

Cheers,
Jacob

Explorer

This is definitly json and some very basic field extractions are already working. I get category, operationName, properties, msg, resourceId and time. Everything I am really concerned with is contained within msg and not extracted. Is there a way to extract protocol, srcip, srcport, destip, destport, action, reason and threatIntel from the properties.msg field?

0 Karma

Motivator

My apologies Jordan. You were clear in your question - I just did not read thoroughly enough. As you said, you should be able to piece together @rbechtold's answer to solve your problem. Here's a working version (mostly stolen from @rbechtold). Ideally, you would not need anything about the line break as it would already be automatically extracted.

| makeresults count=1 
| eval _raw = "{ \"category\": \"AzureFirewallNetworkRule\", \"time\": \"2019-07-30T17:43:59.9812590Z\", \"resourceId\": \"/SUBSCRIPTIONS/XXXX1111-XX11-XX11-XX11-XXXXXX111111/RESOURCEGROUPS/NET-XXX1-XXX1-RG/PROVIDERS/MICROSOFT.NETWORK/AZUREFIREWALLS/NET-XXX1-XXX1-FW01\", \"operationName\": \"AzureFirewallNetworkRuleLog\", \"properties\": {\"msg\":\"TCP request from 100.1.178.18:42132 to 100.1.242.12:51113. Action: Allow\"}}~{ \"category\": \"AzureFirewallNetworkRule\", \"time\": \"2019-07-30T17:44:59.4538600Z\", \"resourceId\": \"/SUBSCRIPTIONS/XXXX1111-XX11-XX11-XX11-XXXXXX111111/RESOURCEGROUPS/NET-XXX1-XXX1-RG/PROVIDERS/MICROSOFT.NETWORK/AZUREFIREWALLS/NET-XXX1-XXX1-FW02\", \"operationName\": \"AzureFirewallNetworkRuleLog\", \"properties\": {\"msg\":\"ICMP request from 100.1.10.1 to 100.1.69.5. Action: Allow\"}}~{ \"category\": \"AzureFirewallNetworkRule\", \"time\": \"2019-07-30T16:13:54.9901410Z\", \"resourceId\": \"/SUBSCRIPTIONS/XXXX1111-XX11-XX11-XX11-XXXXXX111111/RESOURCEGROUPS/DMZ-XXX1-XXX1-RG/PROVIDERS/MICROSOFT.NETWORK/AZUREFIREWALLS/DMZ-XXX1-XXX1-FW01\", \"operationName\": \"AzureFirewallNatRuleLog\", \"properties\": {\"msg\":\"TCP request from 100.1.131.34:3318 to 100.1.224.170:3299 was DNAT'ed to 100.1.80.5:3299\"}}~{ \"category\": \"AzureFirewallNetworkRule\", \"time\": \"2019-07-30T16:39:45.5354460Z\", \"resourceId\": \"/SUBSCRIPTIONS/XXXX1111-XX11-XX11-XX11-XXXXXX111111/RESOURCEGROUPS/DMZ-XXX1-XXX1-RG/PROVIDERS/MICROSOFT.NETWORK/AZUREFIREWALLS/DMZ-XXX1-XXX1-FW01\", \"operationName\": \"AzureFirewallThreatIntelLog\", \"properties\": {\"msg\":\"TCP request from 100.1.231.199:33348 to 100.1.224.170:443. Action: Alert. ThreatIntel: Port Scan\"}}~{ \"category\": \"AzureFirewallApplicationRule\", \"time\": \"2019-07-30T17:19:00.1880780Z\", \"resourceId\": \"/SUBSCRIPTIONS/XXXX1111-XX11-XX11-XX11-XXXXXX111111/RESOURCEGROUPS/DMZ-XXX1-XXX1-RG/PROVIDERS/MICROSOFT.NETWORK/AZUREFIREWALLS/DMZ-XXX1-XXX1-FW01\", \"operationName\": \"AzureFirewallApplicationRuleLog\", \"properties\": {\"msg\":\"HTTPS request from 100.1.110.8:65486 to abcd.efghijk.com:443. Action: Deny. No rule matched. Proceeding with default action\"}}~{ \"category\": \"AzureFirewallApplicationRule\", \"time\": \"2019-07-30T12:00:04.0868100Z\", \"resourceId\": \"/SUBSCRIPTIONS/XXXX1111-XX11-XX11-XX11-XXXXXX111111/RESOURCEGROUPS/DMZ-XXX1-XXX1-RG/PROVIDERS/MICROSOFT.NETWORK/AZUREFIREWALLS/DMZ-XXX1-XXX1-FW01\", \"operationName\": \"AzureFirewallApplicationRuleLog\", \"properties\": {\"msg\":\"HTTPS request from 100.1.66.64:30476. Action: Deny. Reason: SNI TLS extension was missing.\"}}"
 | eval raw = split(_raw, "~") 
| mvexpand raw 
| rename raw AS _raw
| rex field=_raw "\{\"msg\":\"(?<properties_msg>[^}]*)\"\}"
| fields - _time _raw
| rename properties_msg as properties.msg

| rex field=properties.msg "^(?<protocol>[^\s]+)"
| rex field=properties.msg " request from (?<src_ip>[^\:\s]+):?(?<src_port>\d*)"
| rex field=properties.msg "\d+ to (?<dest_ip>[^\:\s]+):?(?<dest_port>\d*)"
| rex field=properties.msg "Action: (?<action>\w+)"
| rex field=properties.msg "ThreatIntel: (?<threatIntel>[\w\s]+)$"
| rex field=properties.msg "Reason: (?<reason>[^\.]+)\."
Cheers,
Jacob

Explorer

Thank you sir! I need to sit down and review all these solutions, but I will let you know how it goes. Thanks for making the Splunk community great!

0 Karma

Communicator

Hey Jordan,

There's almost certainly a better way to do this, but I had fun creating this search. It works (at least) for the sample data. You'll have to tell me if you run into problems on the live data.

I hope this is useful in some way to you! (copy and paste this search into Splunk)

| makeresults count=1 
| eval _raw = "{ \"category\": \"AzureFirewallNetworkRule\", \"time\": \"2019-07-30T17:43:59.9812590Z\", \"resourceId\": \"/SUBSCRIPTIONS/XXXX1111-XX11-XX11-XX11-XXXXXX111111/RESOURCEGROUPS/NET-XXX1-XXX1-RG/PROVIDERS/MICROSOFT.NETWORK/AZUREFIREWALLS/NET-XXX1-XXX1-FW01\", \"operationName\": \"AzureFirewallNetworkRuleLog\", \"properties\": {\"msg\":\"TCP request from XXX.X.178.18:42132 to XXX.X.242.12:51113. Action: Allow\"}}~{ \"category\": \"AzureFirewallNetworkRule\", \"time\": \"2019-07-30T17:44:59.4538600Z\", \"resourceId\": \"/SUBSCRIPTIONS/XXXX1111-XX11-XX11-XX11-XXXXXX111111/RESOURCEGROUPS/NET-XXX1-XXX1-RG/PROVIDERS/MICROSOFT.NETWORK/AZUREFIREWALLS/NET-XXX1-XXX1-FW02\", \"operationName\": \"AzureFirewallNetworkRuleLog\", \"properties\": {\"msg\":\"ICMP request from XXX.X.10.1 to XXX.X.69.5. Action: Allow\"}}~{ \"category\": \"AzureFirewallNetworkRule\", \"time\": \"2019-07-30T16:13:54.9901410Z\", \"resourceId\": \"/SUBSCRIPTIONS/XXXX1111-XX11-XX11-XX11-XXXXXX111111/RESOURCEGROUPS/DMZ-XXX1-XXX1-RG/PROVIDERS/MICROSOFT.NETWORK/AZUREFIREWALLS/DMZ-XXX1-XXX1-FW01\", \"operationName\": \"AzureFirewallNatRuleLog\", \"properties\": {\"msg\":\"TCP request from XXX.X.131.34:3318 to XXX.X.224.170:3299 was DNAT'ed to XXX.X.80.5:3299\"}}~{ \"category\": \"AzureFirewallNetworkRule\", \"time\": \"2019-07-30T16:39:45.5354460Z\", \"resourceId\": \"/SUBSCRIPTIONS/XXXX1111-XX11-XX11-XX11-XXXXXX111111/RESOURCEGROUPS/DMZ-XXX1-XXX1-RG/PROVIDERS/MICROSOFT.NETWORK/AZUREFIREWALLS/DMZ-XXX1-XXX1-FW01\", \"operationName\": \"AzureFirewallThreatIntelLog\", \"properties\": {\"msg\":\"TCP request from XXX.X.231.199:33348 to XXX.X.224.170:443. Action: Alert. ThreatIntel: Port Scan\"}}~{ \"category\": \"AzureFirewallApplicationRule\", \"time\": \"2019-07-30T17:19:00.1880780Z\", \"resourceId\": \"/SUBSCRIPTIONS/XXXX1111-XX11-XX11-XX11-XXXXXX111111/RESOURCEGROUPS/DMZ-XXX1-XXX1-RG/PROVIDERS/MICROSOFT.NETWORK/AZUREFIREWALLS/DMZ-XXX1-XXX1-FW01\", \"operationName\": \"AzureFirewallApplicationRuleLog\", \"properties\": {\"msg\":\"HTTPS request from XXX.X.110.8:65486 to abcd.efghijk.com:443. Action: Deny. No rule matched. Proceeding with default action\"}}~{ \"category\": \"AzureFirewallApplicationRule\", \"time\": \"2019-07-30T12:00:04.0868100Z\", \"resourceId\": \"/SUBSCRIPTIONS/XXXX1111-XX11-XX11-XX11-XXXXXX111111/RESOURCEGROUPS/DMZ-XXX1-XXX1-RG/PROVIDERS/MICROSOFT.NETWORK/AZUREFIREWALLS/DMZ-XXX1-XXX1-FW01\", \"operationName\": \"AzureFirewallApplicationRuleLog\", \"properties\": {\"msg\":\"HTTPS request from XXX.X.66.64:30476. Action: Deny. Reason: SNI TLS extension was missing.\"}}"
| eval raw = split(_raw, "~") 
| mvexpand raw 
| rename raw AS _raw 
`comment("Nothing above here matters, just recreating your dataset. You should be able to copy and paste everything below onto your real search.")`


| rex field=_raw max_match=0 "\"(?<field>[^\"]+)\"\:\s(?<value>.*?)(?:\,\s|\}$)" 
| eval field = mvjoin(field,","), value = mvjoin(value,"~,") 
| eval field = split(field, ","), value = split(value, ",") 
| rename _raw as tempraw 
| eval _raw = mvzip(field, value) 
| rex field=_raw mode=sed "s/=/|||/g" 
| extract kvdelim="," pairdelim="~" mv_add=t 
| foreach * 
    [ rename <<FIELD>> AS <<FIELD>>_temp 
    | rex field=<<FIELD>>_temp mode=sed "s/\|\|\|/=/g" 
    | rename <<FIELD>>_temp AS <<FIELD>>] 
| fields - field value test 
| rex mode=sed field=properties "s/{//g"
| rex mode=sed field=properties "s/}//g"
| rex field=properties max_match=0 "(?<field>[^\:]+)\:(?<value>.*?)(?=\.?\s[A-Za-z]+\:|$)"
| rex field=field mode=sed "s/\.//g"
| rename properties AS _raw 
| eval field = mvjoin(field,","), value = mvjoin(value,"~,") 
| eval field = split(field, ","), value = split(value, ",") 
| eval _raw = mvzip(field, value) 
| rex field=_raw mode=sed "s/=/|||/g" 
| extract kvdelim="," pairdelim="~" mv_add=t 
| foreach * 
    [ rename <<FIELD>> AS <<FIELD>>_temp 
    | rex field=<<FIELD>>_temp mode=sed "s/\|\|\|/=/g" 
    | rename <<FIELD>>_temp AS <<FIELD>>]
| fields - field value
| rename tempraw AS _raw
| rex field=msg "(?<protocol>\S+) request from (?<src_ip>[^\:\s]+)\:?(?<src_port>\d+)? to (?<dest_ip>[^\:\s]+)\:?(?<dest_port>\d+)?"
| rex field=msg "(?<protocol>\S+) request from (?<src_ip>[^\:\s]+)\:?(?<src_port>\d+)?"
| table _time _raw category resourceId operationName src_ip src_port dest_ip dest_port protocol Action Reason ThreatIntel msg *
0 Karma

Explorer

Wow this is some Splunk ninja fu! This works as advertised and does split my results into a table of all fields. I guess where I am a little confused is how to operationalize this. We have lots of different source types reporting firewall events so if I were to be searching src_ip=x.x.x.x, I want all my firewalls to report their events. How can you have future searches like that work with this?

0 Karma

Explorer

Thinking about this more, i can probably use your rex statements to extract the fields I need. I will work on this and let you know how I fair. Thanks for all this work!

Communicator

No problem! Apologies, I just woke up and I'm still getting the gears to start turning.

The solution I provided is very tailored to the data set you provided for me. For that reason, there are a few things I feel like I should warn you about before giving you a solid answer:

1. This first part of this search operates under the assumption that all key value pairs will be in either of these two formats for the base field extraction:

  • "field1": "value1", "field2": "value2", etc...
  • "field1": " {"nestedfield1":"nestedvalue1"}

Key value pairs that are not in either of these formats won't get extracted, and could potentially break the entire search.

2. I'm specifically avoiding using mvexpand because for dealing with my multivalue field because it's such a taxing command, and can get messy. I'm accomplishing this through the mvjoin/split/mvzip/extract command portion.

The part that you need to be careful about is your delimiters in the extract command. If your delimiter is a comma and you have commas in your base fields, the extraction will still work -- just not how you would expect. To get around this, we can replace the delimiters we're expecting to see with "rex mode=sed" and replace those values later on with a foreach command.

3. I repeat the process with a slightly different regex to pull the key value pairs out of the properties field. Essentially I'm looking for data in this format:

  • "field:value field:value field:value etc.."

Again, the key value pairs in this field need to always follow this format in order for the extraction to work.

4. Finally, the protocol, srcip, srcport, destip, and destport fields are not explicitly defined anywhere in the data, they needed to be extracted from the msg field using some logic.

That said, as long as the nested msg field in the properties field is in one of the formats you provided, the extraction will always pull this information. It will also pull those fields if they're in any of the formats previously mentioned since the extractions are dynamic.

I don't know too much on the Splunk backend, but I'm fairly certain I'm not using any expensive commands in this search (like mvexpand, join, transaction, map, etc), so you shouldn't have any problem applying it to big datasets. However, don't take my word on that and try it out for yourself.

In theory, the same process that's used for the "properties" field could be applied to any nested field in the data, although your search might be a bit large and proportionally slower if you do it a few times.

The search is fairly dynamic, but I wouldn't rely too heavily on it. If problems arise, let me know and I'm more than willing to help.

That all being said, I still believe @jacobevans answer will point you more in the right direction.

Good luck!

0 Karma

Explorer

I will review all this as soon as I get some time and let you know how it goes.

I appreciate all the help here. Splunk truly has the best community of any Enterprise solution I have ever used. Thanks!

0 Karma