Getting Data In

extract a specific IP with positive lookahead

avoelk
Communicator

I'm trying to extract multiple fields out of my log. my problem is that I do have multiplie ip adresses - one for the source, one from the webserver etc. so to counter having extracted four ip adresses in every event into the same field I want to use a positive lookahead to tell the regex "yes this ip but only if afterwards there comes x and y"

this is an example log:

 

2014-03-27 23:54:58 1 10.5.6.121 304 TCP_HIT 422 501 GET http assets.razerzone.com 80 /eeimages/products/13785/razer-naga-2014-right-03.png - - - - 54.230.18.168 image/png http://imgur.com/gallery/u3o7l "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Iron/31.0.1700.0 Chrome/31.0.1700.0 Safari/537.36" OBSERVED "Technology/Internet;Shopping" - 163.252.254.203 - 54351
2014-03-28 23:54:59 90670 10.62.0.120 200 TCP_NC_MISS 601 693 GET http realtime.services.disqus.com 80 /api/2/thread/2221828111 ?bust=1780 - - - realtime.services.disqus.com application/json http://disqus.com/embed/comments/?base=default&disqus_version=6c05c0ca&f=bootsnipp&t_i=9WgD&t_u=http%3A%2F%2Fbootsnipp.com%2Fsnippets%2Ffeatured%2Fminimal-preview-thumbnails&t_d=Viewing%20snippet%20Minimal%20Preview%20Thumbnails%20%7C%20Bootsnipp.com&t_t=Viewing%20snippet%20Minimal%20Preview%20Thumbnails%20%7C%20Bootsnipp.com&s_o=default "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/33.0.1750.154 Safari/537.36" OBSERVED "Newsgroups/Forums" - 163.252.254.203 184.173.90.195 52401

 

 

here I want  for example extract the source_ip (the first ip in the event) like this:

 

(\d+\.\d+\.\d+\.\d+)[[:blank:]](?=\d*)

 

 still this gives me many other ips... so I think I have to use multiple lookaheads? but I really don't know how to use it in that case. Especially when I want to extract the ip and give it a field name. In my head this means that I have to encapsulate EVERYTHING including the lookaheads ? 

 

thanks a lot for your help!

Labels (3)
0 Karma
1 Solution

avoelk
Communicator

I got my solution tha works: 

 

so first, when I want to capture a specific IP out of many I need my positive Lookahead with (?=\...)

so the src_ip regex is therefore: 

(?P<src_ip>\d*\.\d*\.\d*\.\d*)(?=\ \d* TCP_)

this makes sure that I capture the ip which has some numbers after a blank and then a TCP_ after another blank space.

to capture many different groups I just had to account for the blank spaces between the different values with \s and then start with my capture group again. : 

(?P<src_ip>\d*\.\d*\.\d*\.\d*)(?=\ \d* TCP_)\s(?P<bits>\d*)\s(?P<tcp_state>\w*_\w*)

 

View solution in original post

0 Karma

avoelk
Communicator

I got my solution tha works: 

 

so first, when I want to capture a specific IP out of many I need my positive Lookahead with (?=\...)

so the src_ip regex is therefore: 

(?P<src_ip>\d*\.\d*\.\d*\.\d*)(?=\ \d* TCP_)

this makes sure that I capture the ip which has some numbers after a blank and then a TCP_ after another blank space.

to capture many different groups I just had to account for the blank spaces between the different values with \s and then start with my capture group again. : 

(?P<src_ip>\d*\.\d*\.\d*\.\d*)(?=\ \d* TCP_)\s(?P<bits>\d*)\s(?P<tcp_state>\w*_\w*)

 

0 Karma
Get Updates on the Splunk Community!

Stay Connected: Your Guide to May Tech Talks, Office Hours, and Webinars!

Take a look below to explore our upcoming Community Office Hours, Tech Talks, and Webinars this month. This ...

They're back! Join the SplunkTrust and MVP at .conf24

With our highly anticipated annual conference, .conf, comes the fez-wearers you can trust! The SplunkTrust, as ...

Enterprise Security Content Update (ESCU) | New Releases

Last month, the Splunk Threat Research Team had two releases of new security content via the Enterprise ...