Splunk Search

is $ supported in regex for field extraction ?

lauMarot
Path Finder

I've got the following log line and I wish I could extract the last IP address field:

.................(variable number of fields)....."N/A","N/A","xxx.xxx.xxx.xxx"

I used to think that something like the following should have worked

(?P‹lastIP›\d+.\d+.\d+.\d+$)

Tags (3)
0 Karma
1 Solution

lauMarot
Path Finder
0 Karma

lauMarot
Path Finder
0 Karma

gabriel_vasseur
Contributor

Apparently there is some white space at the end of the lines. So this should take care of it:

(?P‹lastIP›\d+\.\d+\.\d+\.\d+)"\s*$

lauMarot
Path Finder

of course ... so sorry, so much noise for so little !
Many thanks

0 Karma

rsennett_splunk
Splunk Employee
Splunk Employee

Everyone is on the right track. And any and all of these solutions should have been successful. So what we'll need is a solid sample of two events that show the varied fields. Because there is something you are not noticing or telling us... and all these eyes here should be able to see if you let us. You can anonymize the data by changing a few key numbers. Do not turn it into garbage or we can't give a 1:1 test on the data without editing it ourselves.

I used a sample from an httpd access_combined log on a public facing server. It has two IP addresses

158.111.236.56 - - [01/Aug/2016:11:03:07 -0700] "GET /atlas/NewDay/1/2/2/2/2/2/2/0/2.png?c=1470074467 HTTP/1.1" 200 222762 "http://splunkcraft.splunkoxygen.com/atlas/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.106 Safari/537.36"51.0.274.106

This will capture the last IP only that is immediately followed by the end of the event in a single line event and in a multiline event the $ is present after each \n carriage at the end of EACH line (which could possibly be your problem). It works in my sample data.

(?<IP>\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})$

This will capture the first IP only

 ^(?<IP>\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})

What I have done in the past when unsure as to whether something was being considered single or multiline by Splunk (or rather by any regex engine) I prefix the regex with the specific flag, which tells regex how to treat the line ending very deliberately. so
(?s)(?<IP>\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})$
might work... I'm honestly not sure if it forces the look at the end of the event or if it's just properly labeling it. so no guarantees 🙂

With Splunk... the answer is always "YES!". It just might require more regex than you're prepared for!
0 Karma

lauMarot
Path Finder

thx for helping but it does not work with attached log sample (to large for text input field)

0 Karma

sjohnson_splunk
Splunk Employee
Splunk Employee

Try moving the $ outside of the parenthesis.

,"(?P‹lastIP›\d+.\d+.\d+.\d+)"$

0 Karma

lauMarot
Path Finder

nice try ... but it does not work 😞
Generally I use the wizzard and not type in dircetly my regex but taht time wizzard generate the following error :

The extraction failed. If you are extracting multiple fields, try removing one or more fields. Start with extractions that are embedded within longer text strings.

0 Karma

gabriel_vasseur
Contributor

Forget the wizard and use rex directly:

YOUR SEARCH HERE | rex field=_raw ",\"(?P‹lastIP›\d+\.\d+\.\d+\.\d+)\"$"

lauMarot: that should match the example data you gave us. If it doesn't, please give more example of data.

sjohnson: writing this, I realised you needed to escape the dots in there, otherwise technically your regex could match any long number...

0 Karma

lauMarot
Path Finder

I've attached a three events file sample

0 Karma

gabriel_vasseur
Contributor

I think that's the solution. Judging by the example lauMarot gave, the IP is followed by a double quote before the actual end of line.

0 Karma

skoelpin
SplunkTrust
SplunkTrust

The $ represent the end of a line in multi-lin, so it should work if that IP is the end of the line..

But why use a dollar sign?

Try this

(?P<IP_Name>\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})

This will say look for a digit who's length is from 1-3 digits followed by a . follow by 1-3 digits, then a ., then 1-3 digits, then a . follow by 1-3 digits

0 Karma

lauMarot
Path Finder

(?P\d{1,3}.\d{1,3}.\d{1,3}.\d{1,3}) match the first IP adress found in my log line 😞
adding $ (outside or inside parethesis) breaks any match

0 Karma

skoelpin
SplunkTrust
SplunkTrust

Can you provide us with a few more lines of sample data? Is there always an NA value in front of the IP or can it vary?

0 Karma

lauMarot
Path Finder

I've attached a three events file sample

0 Karma

gabriel_vasseur
Contributor

You forgot the 1, for the last two \d 🙂
I think the anchor might be needed if there are other IP addresses in the same event.

skoelpin
SplunkTrust
SplunkTrust

Whoops, thanks for pointing that out. Yes true, if he has multiple unique IP addresses then he could use a dollar sign or a lookbehind

(?P<LastIP>(?<=N\/A\"\,\")\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})

0 Karma

gabriel_vasseur
Contributor

Yes, but you can't expect the previous field to always have the N/Avalue, so I believe a $would be more appropriate.

0 Karma
.conf21 CFS Extended through 5/20!

Don't miss your chance
to share your Splunk
wisdom in-person or
virtually at .conf21!

Call for Speakers has
been extended through
Thursday, 5/20!