Splunk Search

How can I get regex to over ride the host value with another field value from an interesting field?

Hemnaath
Motivator

Hi All, Need a help in regex for doing the host over ride with dvc_host field value from the interesting fields for a firewall based on the sourcetype.

1) For the first source type paloalto:network:traffic, I am getting the below events details

index=firewall sourcetype="paloalto:network:traffic" 

Event details:

Feb 15 10:21:12 test01pano.xxxxxx.com 1,2018/02/15 10:21:12,012501001041,TRAFFIC,end,1,2018/02/15 10:21:11,10.x.x.x,10.x.x.x,0.0.0.0,0.0.0.0,Foundation Services,,,dns,vsys2,Data-Center-Admin,Data-Center-Core,ae5.2005,ae5.250,pan_log_forward,2018/02/15 10:21:11,667736,1,50811,53,0,0,0x19,udp,allow,xxx,xx,1xx,2,2018/02/15 10:20:42,0,any,0,6477528057945424450,0x8000000000000000,10.0.0.0-10.x.x.x,10.0.0.0-10.x.x.x,0,1,1,aged-out,17,0,0,0,Data_Center,east01fw,from-policy,,,0,,0,,N/A

Requirement:
We want to overwrite the "host" field for firewall logs to use to value for the "dvc_host" field

host=test01pano.xxxxxx.com should be replaced with the dvc_host value=deast01fw

I had tried this query but it is throwing Error:

index=firewall  sourcetype="paloalto:network:traffic" | rex field=_raw (?<host>\b[^(\.),]+\b(,)+\b(?=from-policy)\b) | table host dvc_host

Error in 'SearchParser': Missing a search command before '^'. Error at position '89' of search query 'search index=firewall sourcetype="paloalto:networ...{snipped} {errorcontext = ?<host>\b[^(\.),]+\b(}'.

But the regex is working fine when tested in the regex101.com , not sure where is the problem.

2) For the second sourcetype paloalto:network:sys, I am getting the below events:

Event Details:

Feb 15 10:45:14 test01pano.xxxxx.com 1,2018/02/15 10:45:14,000702503748,SYSTEM,general,0,2018/02/15 10:45:14,,general,,0,0,general,informational,"Connection to Update server: updates.paloaltonetworks.com completed successfully, initiated by 10.x.x.x",139387,0x0,0,0,0,0,,east01pano

Requirement:
We want overwrite the "host" field for logs to use to value for the "dvc_host" field.

host=test01pano.xxxxxx.com should be replaced with the dvc_host value=east01pano

Kindly guide me in creating the a regex which can over ride the host value to the value of dvc_host field.

0 Karma
1 Solution

FrankVl
Ultra Champion

As commented in the discussion where you posted this earlier: you need to put quotes around your regex, that is why you get the error.
Your regex seems to work indeed. You could make it more efficient by making use of the event structure, which (I assume) has the host field always in the same place? So the following should work:

"(?:[^,]*,){52}(?<host>\w+)"

https://regex101.com/r/Q5LWGD/1

As for the second sourcetype:
If that value is always at the end of the event string as per your examples, then the following regex should work:

"(?<host>\w+)$"

or a bit more efficient by telling that the value we're looking for comes after a comma:

",(?<host>\w+)$"

https://regex101.com/r/uUoOtj/2

Note: these examples are strictly based on your sample events with the hostname showing as a single word without special characters. You might have to tweak them a bit if you also need to deal with situations where the events contain hostnames with - and . characters in them.

In general you could have a look at how the TA for palo alto handles these events, perhaps that includes a regex that you could re-use reliably, rather than re-inventing the wheel 🙂

View solution in original post

FrankVl
Ultra Champion

As commented in the discussion where you posted this earlier: you need to put quotes around your regex, that is why you get the error.
Your regex seems to work indeed. You could make it more efficient by making use of the event structure, which (I assume) has the host field always in the same place? So the following should work:

"(?:[^,]*,){52}(?<host>\w+)"

https://regex101.com/r/Q5LWGD/1

As for the second sourcetype:
If that value is always at the end of the event string as per your examples, then the following regex should work:

"(?<host>\w+)$"

or a bit more efficient by telling that the value we're looking for comes after a comma:

",(?<host>\w+)$"

https://regex101.com/r/uUoOtj/2

Note: these examples are strictly based on your sample events with the hostname showing as a single word without special characters. You might have to tweak them a bit if you also need to deal with situations where the events contain hostnames with - and . characters in them.

In general you could have a look at how the TA for palo alto handles these events, perhaps that includes a regex that you could re-use reliably, rather than re-inventing the wheel 🙂

Hemnaath
Motivator

Hi Frank, Can I get the full regex for the second source type, which you had mentioned in the above comment.

",(\w+)$"

And also I am facing another issue in my below regex, it is capturing the dvc_host field values along with the actual host =test01pano.xxxxxx.com value.

 index=firewall  sourcetype="paloalto:network:traffic" | rex field=_raw " (?<host>\b[^(\.),]+\b(,)+\b(?=from-policy)\b)" | table host dvc_host

Example :

Feb 15 10:21:12 test01pano.xxxxxx.com 1,2018/02/15 10:21:12,012501001041,TRAFFIC,end,1,2018/02/15 10:21:11,10.x.x.x,10.x.x.x,0.0.0.0,0.0.0.0,Foundation Services,,,dns,vsys2,Data-Center-Admin,Data-Center-Core,ae5.2005,ae5.250,pan_log_forward,2018/02/15 10:21:11,667736,1,50811,53,0,0,0x19,udp,allow,xxx,xx,1xx,2,2018/02/15 10:20:42,0,any,0,6477528057945424450,0x8000000000000000,10.0.0.0-10.x.x.x,10.0.0.0-10.x.x.x,0,1,1,aged-out,17,0,0,0,Data_Center,deast01fw,from-policy,,,0,,0,,N/A

host field = test01pano.xxxxxx.com

Kindly guide me to remove this host value getting added along with the dvc_host value in the selected fields.

0 Karma

FrankVl
Ultra Champion

",(\w+)$" is the full regex. It simply matches a comma, followed by a word (\w)+, at the end of the string ($). You just need to add host as the name of the capture group, as per my updated post above.

Regarding your issue with the host field still containing the incorrect value: your rex command doesn't name the capture group anymore, I guess that is why it doesn't overwrite the host field? So your search should be:

  index=firewall  sourcetype="paloalto:network:traffic" | rex field=_raw " (?<host>\b[^(\.),]+\b(,)+\b(?=from-policy)\b)" | table host dvc_host
0 Karma

Hemnaath
Motivator

Hi Frank, Yes I got the second regex and it worked but as it also taking the host value along with the dvc_host field value. I mean its over ridding the host value with dvc_host field value along with the actual host field value.

sourcetype="paloalto:network:sys"

index=firewall sourcetype="paloalto:network:sys" | rex field=_raw "(?<host>(\w+)$)" | table host dvc_host 

Event Details:
Feb 15 10:45:14 test01pano.xxxxx.com 1,2018/02/15 10:45:14,000702503748,SYSTEM,general,0,2018/02/15 10:45:14,,general,,0,0,general,informational,"Connection to Update server: updates.paloaltonetworks.com completed successfully, initiated by 10.x.x.x",139387,0x0,0,0,0,0,,deast01pano

sourcetype="paloalto:network:traffic"

 index=firewall  sourcetype="paloalto:network:traffic" | rex field=_raw "
    (?<host>\b[^(\.),]+\b(?=,from-policy)\b)" | table host dvc_host

Event Details:

 Feb 15 10:21:12 **test01pano.xxxxxx.com** 1,2018/02/15 10:21:12,012501001041,TRAFFIC,end,1,2018/02/15 10:21:11,10.x.x.x,10.x.x.x,0.0.0.0,0.0.0.0,Foundation Services,,,dns,vsys2,Data-Center-Admin,Data-Center-Core,ae5.2005,ae5.250,pan_log_forward,2018/02/15 10:21:11,667736,1,50811,53,0,0,0x19,udp,allow,xxx,xx,1xx,2,2018/02/15 10:20:42,0,any,0,6477528057945424450,0x8000000000000000,10.0.0.0-10.x.x.x,10.0.0.0-10.x.x.x,0,1,1,aged-out,17,0,0,0,Data_Center,deast01fw,from-policy,,,0,,0,,N/A

Both this query over rides the host field value with dvc_host field value from the interesting fields, but it is also taking the actual host field value =test01pano.xxxxxx.com

Kindly guide me how to remove this host value getting added along with the dvc_host value in the selected fields

0 Karma

FrankVl
Ultra Champion

Oh, you mean after this the host field is multivalued, because the rex command adds a value, rather than overwriting? Strange, I haven't seen that behaviour before. Can you perhaps share a screenshot of what that looks like?

Guess you should be able to use some of the commands to manipulate multi valued fields to resolve that (sorry, don't have time right now to dig into that and provide an example).

In the end when you put this in props and transforms (assuming that's your end goal) that shouldn't be an issue anyway, right?

0 Karma

Hemnaath
Motivator

Hi Frank, how to attach the screen in forum, I have not done that before, I will try to execute the same query in my test environment and attach the screen shot.

0 Karma

Hemnaath
Motivator

Hi Frank, No its over writing the host field value with the dvc_host field but it is also taking the actual host value " test01pano.xxxxx.com" from where the firewall logs is sourced.

index=firewall  sourcetype="paloalto:network:traffic" | rex field=_raw "(?<host>\b[^(\.),]+\b(?=,from-policy)\b)" | table host dvc_host

Event Details:

      Feb 15 17:32:27 test01pano.xxxxx.com 1,2018/02/15 17:32:27,012501001041,TRAFFIC,deny,1,2018/02/15 17:32:17,10.232.56.20,141.146.44.51,168.133.80.30,141.146.44.51,interzone-default,,,ssl,vsys6,Internet-Core,Internet-Untrusted,ae1.238,ae1.900,pan_log_forward,2018/02/15 17:32:17,873850,1,37661,443,52754,443,0x400000,tcp,reset-both,xxxx.xxx.xxxx,8,2018/02/15 17:32:18,0,not-resolved,0,6477528057946076786,0x8000000000000000,10.x.x.x-10.x.x.x,United States,0,x,x,policy-deny,37,0,0,0,Internet,deast01fw,from-application,,,0,,0,,N/A

But when same query is execute with dedup command its working fine.

 index=firewall  sourcetype="paloalto:network:traffic" | rex field=_raw "(?<host>\b[^(\.),]+\b(?=,from-policy)\b)" | dedup dvc_host | table host dvc_host 

Kindly guide me on this.

0 Karma

FrankVl
Ultra Champion

That's because this sample event you are showing, has "from-application" after the dvc-host field, so your regex doesn't match and host field will not get overridden for this particular event.

So you'll need to tweak your regex. You could try my suggestion from few comments back, which uses the location of the field in the string of comma separated fields, rather than relying on some string that comes after it like you do, while that field that comes after it can contain different values or may even be empty in some cases.

"(?:[^,]*,){52}(?<host>\w+)"
0 Karma

Hemnaath
Motivator

Hi Frank, thanks a lot I have discovered what was actually causing an issue, yes you are right for this particular event the host name was coming before , from-application and for other events the host name was coming before , from-policy. Since all these events are coming from the same host, I had written the regex to capture the host name from the firewall logs and over ride it to the host field value in the selected fields.

Query :

 index=firewall  sourcetype="paloalto:network:traffic" | rex field=_raw "(?<host>\b[^(\.),]+\b(?=,(?=from-policy|from-application))\b)" | table host dvc_host

And found to working fine in search.

Now I am going to place this regex under Props.conf and transforms.conf like below but not sure whether the below stanza are written correctly or not.

[paloalto:network:traffic]
TRANSFORMS-host_override = host_override

Transforms.conf

[host_override]
 REGEX = (?<host>\b[^(\.),]+\b(?=,(?=from-policy|from-application))\b)
 DEST_KEY = MetaData:Host
 FORMAT = host::$1
0 Karma

FrankVl
Ultra Champion

You want to give that transforms.conf stanza a unique name, different from the one you used for your previous host override question: https://answers.splunk.com/answers/615561/how-to-overwrite-the-host-field-value-with-dvc-fie.html

Also: you don't need to name the capture group, so leave out the following bit:

?<host>
0 Karma

Hemnaath
Motivator

Hi Frank, Need a small help in the regex, I am able to match the host name but unable to over write to the host field in the selected field in splunk, using the below regex. Could you please guide in correcting the regex.

Regex:
index=firewall sourcetype="network:log" |rex field=_raw (?)(?<=Client_VPN,)\b[(\w)]+\b | table host host_name

Event Details:
Feb 16 23:54:02 test01.xxxx.com 1,2018/02/16 23:54:02,012501001035,6477528014920876411,0x8000000000000000,USERID,logout,473,2018/02/16 23:53:46,36,0,0,0,Client_VPN,node01fw,3,vsys3,10.X.X.X,ddesa0002,,0,1,0,0,0,vpn-client,globalprotect,0,0,,2018/02/16 23:53:47,1

Actual Requirement:

Need to over write the host field value with the host_name field value from the interesting field.
host=test01.xxxx.com
host_name=node02fw

Kindly guide me on the regex to over write the host value with the host_name value.

0 Karma

FrankVl
Ultra Champion

Edit: moved my comment to your new question. Appreciated if you could confirm whether this question was now successfully answered 🙂

0 Karma

Hemnaath
Motivator

Hi Frank, Yes you are right not all the events are preceding with the "Client_VPN" so that events which are not preceding the value of Client_VPN are throwing the actual host name "test01.xxx.com" in the host field of selected field.

Then I had tried to locate the position of the host_name field in the events but unfortunately I am unable to narrow down the exact position. So instead of matching the Client_VPN, most of the host name comes before this value ",3,vsys ".

Used the below Regex

| rex field=_raw "(?\b[^(\.),]+\b(?=,\d,vsys*)\b)"

Feb 16 23:53:54 test01.xxxx.com 1,2018/02/16 23:53:54,012501001041,6477528057870932374,0x8000000000000000,USERID,logout,369,2018/02/16 23:53:47,36,0,0,0,Client_VPN,node01fw,3,vsys3,10.X.X.X,ddesa0002,,0,1,0,0,0,vpn-client,globalprotect,0,0,,2018/02/16 23:53:47,1

Due to this the regex was unable to over right the host value with the host_name field value.

Kindly guide in the regex to over write the host value with the host_name.

0 Karma

FrankVl
Ultra Champion

Please move this info to the separate question you posted for this new data type. Also: you mention you cannot narrow down the position. Why not? Seems to be in position 14 in each of your sample events.

0 Karma

Hemnaath
Motivator

Hi Frank, Hey the same regex which was used to over right the host values with the interesting field values, all the regex worked fine in the search portal. But when the same was updated in the props.conf and transforms.conf it did not work. We have props/ transforms.conf in Heavy forwarder instances.

Props.conf details:

 [paloalto:network:traffic]
 TRANSFORMS-host_override = host_override

 [paloalto:network:tsys]
 TRANSFORMS-host_override = host_override1

 [paloalto:network:log]
 TRANSFORMS-host_override = host_override2

Transforms.conf Details:

[host_override]
 REGEX =(?<host>\b[^(\.),]+\b(?=,(?=from-policy|from-application))\b)
 DEST_KEY = MetaData:Host
 FORMAT = host::$1

[host_override1]
 REGEX =(?<host>(\w+)$)
 DEST_KEY = MetaData:Host
 FORMAT = host::$1

[host_override2]
 REGEX =(?:[^,]*,){14}(?<host>\w+)
 DEST_KEY = MetaData:Host
 FORMAT = host::$1

Kindly guide me on this.

0 Karma

FrankVl
Ultra Champion

Note sure about that first one, but I would suggest using the simple approach for that one, which is based on the position of the hostname field (similar to the 3rd option you have).

Second one: why the nested capture group? Regex can simply be:

,(\w+)$

Third case:
Not sure why that wouldn't work.

0 Karma

Hemnaath
Motivator

Hi Frank, Do I need to update the props/transforms in search head instances also, because I could see same configuration in search head cluster member.

0 Karma

FrankVl
Ultra Champion

Well, may always be a good idea to keep consistent .conf files throughout your environment. But such index time configurations do not need to be present on search heads to be effective.

0 Karma

Hemnaath
Motivator

Hi Frank, coming to this question, I had tested below props.conf / transforms.conf and it worked well in my test environment. Both test/Prod have the same set of props/transforms.conf files but only difference is that test is a single machine, where as prod is a distributed machine.

Question :

1) Why the same Props/transforms.conf is not working in prod? In Prod TA-Paloalto is configured in HF instances and search head cluster member.

2) Do I need copy/paste the below stanza details in both Heavy forwarder/search head cluster member?

Props.conf:

[paloalto:network:traffic]
TRANSFORMS-host_override = host_override

[paloalto:network:system]
TRANSFORMS-host_override = host_override1

Transforms.conf:

  [host_override]
  REGEX = (?:[^,]*,){52}(?<host>\w+)
  DEST_KEY = MetaData:Host
  FORMAT = host::$1

  [host_override1]
  REGEX = (?<host>(\w+)$)
  DEST_KEY = MetaData:Host
  FORMAT = host::$1

Kindly guide me on this.

0 Karma

FrankVl
Ultra Champion

This kind of index time configuration should be on the instance that does the parsing, so the first heavy instance (HF or Indexer, depending on your architecture).

As mentioned in your other related question: you mention you use the palo alto TA, are you sure the sourcetypes you use in your props/transforms actually match the sourcetypes used in prod? Because I think the TA uses a different set of sourcetypes?

0 Karma
Get Updates on the Splunk Community!

Your Guide to SPL2 at .conf24!

So, you’re headed to .conf24? You’re in for a good time. Las Vegas weather is just *chef’s kiss* beautiful in ...

Get ready to show some Splunk Certification swagger at .conf24!

Dive into the deep end of data by earning a Splunk Certification at .conf24. We're enticing you again this ...

Built-in Service Level Objectives Management to Bridge the Gap Between Service & ...

Now On-Demand Join us to learn more about how you can leverage Service Level Objectives (SLOs) and the new ...