Hi All, Need a help in regex for doing the host over ride with dvc_host field value from the interesting fields for a firewall based on the sourcetype.
1) For the first source type paloalto:network:traffic, I am getting the below events details
index=firewall sourcetype="paloalto:network:traffic"
Event details:
Feb 15 10:21:12 test01pano.xxxxxx.com 1,2018/02/15 10:21:12,012501001041,TRAFFIC,end,1,2018/02/15 10:21:11,10.x.x.x,10.x.x.x,0.0.0.0,0.0.0.0,Foundation Services,,,dns,vsys2,Data-Center-Admin,Data-Center-Core,ae5.2005,ae5.250,pan_log_forward,2018/02/15 10:21:11,667736,1,50811,53,0,0,0x19,udp,allow,xxx,xx,1xx,2,2018/02/15 10:20:42,0,any,0,6477528057945424450,0x8000000000000000,10.0.0.0-10.x.x.x,10.0.0.0-10.x.x.x,0,1,1,aged-out,17,0,0,0,Data_Center,east01fw,from-policy,,,0,,0,,N/A
Requirement:
We want to overwrite the "host" field for firewall logs to use to value for the "dvc_host" field
host=test01pano.xxxxxx.com should be replaced with the dvc_host value=deast01fw
I had tried this query but it is throwing Error:
index=firewall sourcetype="paloalto:network:traffic" | rex field=_raw (?<host>\b[^(\.),]+\b(,)+\b(?=from-policy)\b) | table host dvc_host
Error in 'SearchParser': Missing a search command before '^'. Error at position '89' of search query 'search index=firewall sourcetype="paloalto:networ...{snipped} {errorcontext = ?<host>\b[^(\.),]+\b(}'.
But the regex is working fine when tested in the regex101.com , not sure where is the problem.
2) For the second sourcetype paloalto:network:sys, I am getting the below events:
Event Details:
Feb 15 10:45:14 test01pano.xxxxx.com 1,2018/02/15 10:45:14,000702503748,SYSTEM,general,0,2018/02/15 10:45:14,,general,,0,0,general,informational,"Connection to Update server: updates.paloaltonetworks.com completed successfully, initiated by 10.x.x.x",139387,0x0,0,0,0,0,,east01pano
Requirement:
We want overwrite the "host" field for logs to use to value for the "dvc_host" field.
host=test01pano.xxxxxx.com should be replaced with the dvc_host value=east01pano
Kindly guide me in creating the a regex which can over ride the host value to the value of dvc_host field.
As commented in the discussion where you posted this earlier: you need to put quotes around your regex, that is why you get the error.
Your regex seems to work indeed. You could make it more efficient by making use of the event structure, which (I assume) has the host field always in the same place? So the following should work:
"(?:[^,]*,){52}(?<host>\w+)"
https://regex101.com/r/Q5LWGD/1
As for the second sourcetype:
If that value is always at the end of the event string as per your examples, then the following regex should work:
"(?<host>\w+)$"
or a bit more efficient by telling that the value we're looking for comes after a comma:
",(?<host>\w+)$"
https://regex101.com/r/uUoOtj/2
Note: these examples are strictly based on your sample events with the hostname showing as a single word without special characters. You might have to tweak them a bit if you also need to deal with situations where the events contain hostnames with - and . characters in them.
In general you could have a look at how the TA for palo alto handles these events, perhaps that includes a regex that you could re-use reliably, rather than re-inventing the wheel 🙂
As commented in the discussion where you posted this earlier: you need to put quotes around your regex, that is why you get the error.
Your regex seems to work indeed. You could make it more efficient by making use of the event structure, which (I assume) has the host field always in the same place? So the following should work:
"(?:[^,]*,){52}(?<host>\w+)"
https://regex101.com/r/Q5LWGD/1
As for the second sourcetype:
If that value is always at the end of the event string as per your examples, then the following regex should work:
"(?<host>\w+)$"
or a bit more efficient by telling that the value we're looking for comes after a comma:
",(?<host>\w+)$"
https://regex101.com/r/uUoOtj/2
Note: these examples are strictly based on your sample events with the hostname showing as a single word without special characters. You might have to tweak them a bit if you also need to deal with situations where the events contain hostnames with - and . characters in them.
In general you could have a look at how the TA for palo alto handles these events, perhaps that includes a regex that you could re-use reliably, rather than re-inventing the wheel 🙂
Hi Frank, Can I get the full regex for the second source type, which you had mentioned in the above comment.
",(\w+)$"
And also I am facing another issue in my below regex, it is capturing the dvc_host field values along with the actual host =test01pano.xxxxxx.com value.
index=firewall sourcetype="paloalto:network:traffic" | rex field=_raw " (?<host>\b[^(\.),]+\b(,)+\b(?=from-policy)\b)" | table host dvc_host
Example :
Feb 15 10:21:12 test01pano.xxxxxx.com 1,2018/02/15 10:21:12,012501001041,TRAFFIC,end,1,2018/02/15 10:21:11,10.x.x.x,10.x.x.x,0.0.0.0,0.0.0.0,Foundation Services,,,dns,vsys2,Data-Center-Admin,Data-Center-Core,ae5.2005,ae5.250,pan_log_forward,2018/02/15 10:21:11,667736,1,50811,53,0,0,0x19,udp,allow,xxx,xx,1xx,2,2018/02/15 10:20:42,0,any,0,6477528057945424450,0x8000000000000000,10.0.0.0-10.x.x.x,10.0.0.0-10.x.x.x,0,1,1,aged-out,17,0,0,0,Data_Center,deast01fw,from-policy,,,0,,0,,N/A
host field = test01pano.xxxxxx.com
Kindly guide me to remove this host value getting added along with the dvc_host value in the selected fields.
",(\w+)$" is the full regex. It simply matches a comma, followed by a word (\w)+, at the end of the string ($). You just need to add host as the name of the capture group, as per my updated post above.
Regarding your issue with the host field still containing the incorrect value: your rex command doesn't name the capture group anymore, I guess that is why it doesn't overwrite the host field? So your search should be:
index=firewall sourcetype="paloalto:network:traffic" | rex field=_raw " (?<host>\b[^(\.),]+\b(,)+\b(?=from-policy)\b)" | table host dvc_host
Hi Frank, Yes I got the second regex and it worked but as it also taking the host value along with the dvc_host field value. I mean its over ridding the host value with dvc_host field value along with the actual host field value.
sourcetype="paloalto:network:sys"
index=firewall sourcetype="paloalto:network:sys" | rex field=_raw "(?<host>(\w+)$)" | table host dvc_host
Event Details:
Feb 15 10:45:14 test01pano.xxxxx.com 1,2018/02/15 10:45:14,000702503748,SYSTEM,general,0,2018/02/15 10:45:14,,general,,0,0,general,informational,"Connection to Update server: updates.paloaltonetworks.com completed successfully, initiated by 10.x.x.x",139387,0x0,0,0,0,0,,deast01pano
sourcetype="paloalto:network:traffic"
index=firewall sourcetype="paloalto:network:traffic" | rex field=_raw "
(?<host>\b[^(\.),]+\b(?=,from-policy)\b)" | table host dvc_host
Event Details:
Feb 15 10:21:12 **test01pano.xxxxxx.com** 1,2018/02/15 10:21:12,012501001041,TRAFFIC,end,1,2018/02/15 10:21:11,10.x.x.x,10.x.x.x,0.0.0.0,0.0.0.0,Foundation Services,,,dns,vsys2,Data-Center-Admin,Data-Center-Core,ae5.2005,ae5.250,pan_log_forward,2018/02/15 10:21:11,667736,1,50811,53,0,0,0x19,udp,allow,xxx,xx,1xx,2,2018/02/15 10:20:42,0,any,0,6477528057945424450,0x8000000000000000,10.0.0.0-10.x.x.x,10.0.0.0-10.x.x.x,0,1,1,aged-out,17,0,0,0,Data_Center,deast01fw,from-policy,,,0,,0,,N/A
Both this query over rides the host field value with dvc_host field value from the interesting fields, but it is also taking the actual host field value =test01pano.xxxxxx.com
Kindly guide me how to remove this host value getting added along with the dvc_host value in the selected fields
Oh, you mean after this the host field is multivalued, because the rex command adds a value, rather than overwriting? Strange, I haven't seen that behaviour before. Can you perhaps share a screenshot of what that looks like?
Guess you should be able to use some of the commands to manipulate multi valued fields to resolve that (sorry, don't have time right now to dig into that and provide an example).
In the end when you put this in props and transforms (assuming that's your end goal) that shouldn't be an issue anyway, right?
Hi Frank, how to attach the screen in forum, I have not done that before, I will try to execute the same query in my test environment and attach the screen shot.
Hi Frank, No its over writing the host field value with the dvc_host field but it is also taking the actual host value " test01pano.xxxxx.com" from where the firewall logs is sourced.
index=firewall sourcetype="paloalto:network:traffic" | rex field=_raw "(?<host>\b[^(\.),]+\b(?=,from-policy)\b)" | table host dvc_host
Event Details:
Feb 15 17:32:27 test01pano.xxxxx.com 1,2018/02/15 17:32:27,012501001041,TRAFFIC,deny,1,2018/02/15 17:32:17,10.232.56.20,141.146.44.51,168.133.80.30,141.146.44.51,interzone-default,,,ssl,vsys6,Internet-Core,Internet-Untrusted,ae1.238,ae1.900,pan_log_forward,2018/02/15 17:32:17,873850,1,37661,443,52754,443,0x400000,tcp,reset-both,xxxx.xxx.xxxx,8,2018/02/15 17:32:18,0,not-resolved,0,6477528057946076786,0x8000000000000000,10.x.x.x-10.x.x.x,United States,0,x,x,policy-deny,37,0,0,0,Internet,deast01fw,from-application,,,0,,0,,N/A
But when same query is execute with dedup command its working fine.
index=firewall sourcetype="paloalto:network:traffic" | rex field=_raw "(?<host>\b[^(\.),]+\b(?=,from-policy)\b)" | dedup dvc_host | table host dvc_host
Kindly guide me on this.
That's because this sample event you are showing, has "from-application" after the dvc-host field, so your regex doesn't match and host field will not get overridden for this particular event.
So you'll need to tweak your regex. You could try my suggestion from few comments back, which uses the location of the field in the string of comma separated fields, rather than relying on some string that comes after it like you do, while that field that comes after it can contain different values or may even be empty in some cases.
"(?:[^,]*,){52}(?<host>\w+)"
Hi Frank, thanks a lot I have discovered what was actually causing an issue, yes you are right for this particular event the host name was coming before , from-application and for other events the host name was coming before , from-policy. Since all these events are coming from the same host, I had written the regex to capture the host name from the firewall logs and over ride it to the host field value in the selected fields.
Query :
index=firewall sourcetype="paloalto:network:traffic" | rex field=_raw "(?<host>\b[^(\.),]+\b(?=,(?=from-policy|from-application))\b)" | table host dvc_host
And found to working fine in search.
Now I am going to place this regex under Props.conf and transforms.conf like below but not sure whether the below stanza are written correctly or not.
[paloalto:network:traffic]
TRANSFORMS-host_override = host_override
Transforms.conf
[host_override]
REGEX = (?<host>\b[^(\.),]+\b(?=,(?=from-policy|from-application))\b)
DEST_KEY = MetaData:Host
FORMAT = host::$1
You want to give that transforms.conf stanza a unique name, different from the one you used for your previous host override question: https://answers.splunk.com/answers/615561/how-to-overwrite-the-host-field-value-with-dvc-fie.html
Also: you don't need to name the capture group, so leave out the following bit:
?<host>
Hi Frank, Need a small help in the regex, I am able to match the host name but unable to over write to the host field in the selected field in splunk, using the below regex. Could you please guide in correcting the regex.
Regex:
index=firewall sourcetype="network:log" |rex field=_raw (?)(?<=Client_VPN,)\b[(\w)]+\b | table host host_name
Event Details:
Feb 16 23:54:02 test01.xxxx.com 1,2018/02/16 23:54:02,012501001035,6477528014920876411,0x8000000000000000,USERID,logout,473,2018/02/16 23:53:46,36,0,0,0,Client_VPN,node01fw,3,vsys3,10.X.X.X,ddesa0002,,0,1,0,0,0,vpn-client,globalprotect,0,0,,2018/02/16 23:53:47,1
Actual Requirement:
Need to over write the host field value with the host_name field value from the interesting field.
host=test01.xxxx.com
host_name=node02fw
Kindly guide me on the regex to over write the host value with the host_name value.
Edit: moved my comment to your new question. Appreciated if you could confirm whether this question was now successfully answered 🙂
Hi Frank, Yes you are right not all the events are preceding with the "Client_VPN" so that events which are not preceding the value of Client_VPN are throwing the actual host name "test01.xxx.com" in the host field of selected field.
Then I had tried to locate the position of the host_name field in the events but unfortunately I am unable to narrow down the exact position. So instead of matching the Client_VPN, most of the host name comes before this value ",3,vsys ".
Used the below Regex
| rex field=_raw "(?\b[^(\.),]+\b(?=,\d,vsys*)\b)"
Feb 16 23:53:54 test01.xxxx.com 1,2018/02/16 23:53:54,012501001041,6477528057870932374,0x8000000000000000,USERID,logout,369,2018/02/16 23:53:47,36,0,0,0,Client_VPN,node01fw,3,vsys3,10.X.X.X,ddesa0002,,0,1,0,0,0,vpn-client,globalprotect,0,0,,2018/02/16 23:53:47,1
Due to this the regex was unable to over right the host value with the host_name field value.
Kindly guide in the regex to over write the host value with the host_name.
Please move this info to the separate question you posted for this new data type. Also: you mention you cannot narrow down the position. Why not? Seems to be in position 14 in each of your sample events.
Hi Frank, Hey the same regex which was used to over right the host values with the interesting field values, all the regex worked fine in the search portal. But when the same was updated in the props.conf and transforms.conf it did not work. We have props/ transforms.conf in Heavy forwarder instances.
Props.conf details:
[paloalto:network:traffic]
TRANSFORMS-host_override = host_override
[paloalto:network:tsys]
TRANSFORMS-host_override = host_override1
[paloalto:network:log]
TRANSFORMS-host_override = host_override2
Transforms.conf Details:
[host_override]
REGEX =(?<host>\b[^(\.),]+\b(?=,(?=from-policy|from-application))\b)
DEST_KEY = MetaData:Host
FORMAT = host::$1
[host_override1]
REGEX =(?<host>(\w+)$)
DEST_KEY = MetaData:Host
FORMAT = host::$1
[host_override2]
REGEX =(?:[^,]*,){14}(?<host>\w+)
DEST_KEY = MetaData:Host
FORMAT = host::$1
Kindly guide me on this.
Note sure about that first one, but I would suggest using the simple approach for that one, which is based on the position of the hostname field (similar to the 3rd option you have).
Second one: why the nested capture group? Regex can simply be:
,(\w+)$
Third case:
Not sure why that wouldn't work.
Hi Frank, Do I need to update the props/transforms in search head instances also, because I could see same configuration in search head cluster member.
Well, may always be a good idea to keep consistent .conf files throughout your environment. But such index time configurations do not need to be present on search heads to be effective.
Hi Frank, coming to this question, I had tested below props.conf / transforms.conf and it worked well in my test environment. Both test/Prod have the same set of props/transforms.conf files but only difference is that test is a single machine, where as prod is a distributed machine.
Question :
1) Why the same Props/transforms.conf is not working in prod? In Prod TA-Paloalto is configured in HF instances and search head cluster member.
2) Do I need copy/paste the below stanza details in both Heavy forwarder/search head cluster member?
Props.conf:
[paloalto:network:traffic]
TRANSFORMS-host_override = host_override
[paloalto:network:system]
TRANSFORMS-host_override = host_override1
Transforms.conf:
[host_override]
REGEX = (?:[^,]*,){52}(?<host>\w+)
DEST_KEY = MetaData:Host
FORMAT = host::$1
[host_override1]
REGEX = (?<host>(\w+)$)
DEST_KEY = MetaData:Host
FORMAT = host::$1
Kindly guide me on this.
This kind of index time configuration should be on the instance that does the parsing, so the first heavy instance (HF or Indexer, depending on your architecture).
As mentioned in your other related question: you mention you use the palo alto TA, are you sure the sourcetypes you use in your props/transforms actually match the sourcetypes used in prod? Because I think the TA uses a different set of sourcetypes?