Getting Data In

extract a specific IP with positive lookahead

avoelk
Communicator

I'm trying to extract multiple fields out of my log. my problem is that I do have multiplie ip adresses - one for the source, one from the webserver etc. so to counter having extracted four ip adresses in every event into the same field I want to use a positive lookahead to tell the regex "yes this ip but only if afterwards there comes x and y"

this is an example log:

 

2014-03-27 23:54:58 1 10.5.6.121 304 TCP_HIT 422 501 GET http assets.razerzone.com 80 /eeimages/products/13785/razer-naga-2014-right-03.png - - - - 54.230.18.168 image/png http://imgur.com/gallery/u3o7l "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Iron/31.0.1700.0 Chrome/31.0.1700.0 Safari/537.36" OBSERVED "Technology/Internet;Shopping" - 163.252.254.203 - 54351
2014-03-28 23:54:59 90670 10.62.0.120 200 TCP_NC_MISS 601 693 GET http realtime.services.disqus.com 80 /api/2/thread/2221828111 ?bust=1780 - - - realtime.services.disqus.com application/json http://disqus.com/embed/comments/?base=default&disqus_version=6c05c0ca&f=bootsnipp&t_i=9WgD&t_u=http%3A%2F%2Fbootsnipp.com%2Fsnippets%2Ffeatured%2Fminimal-preview-thumbnails&t_d=Viewing%20snippet%20Minimal%20Preview%20Thumbnails%20%7C%20Bootsnipp.com&t_t=Viewing%20snippet%20Minimal%20Preview%20Thumbnails%20%7C%20Bootsnipp.com&s_o=default "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/33.0.1750.154 Safari/537.36" OBSERVED "Newsgroups/Forums" - 163.252.254.203 184.173.90.195 52401

 

 

here I want  for example extract the source_ip (the first ip in the event) like this:

 

(\d+\.\d+\.\d+\.\d+)[[:blank:]](?=\d*)

 

 still this gives me many other ips... so I think I have to use multiple lookaheads? but I really don't know how to use it in that case. Especially when I want to extract the ip and give it a field name. In my head this means that I have to encapsulate EVERYTHING including the lookaheads ? 

 

thanks a lot for your help!

Labels (3)
0 Karma
1 Solution

avoelk
Communicator

I got my solution tha works: 

 

so first, when I want to capture a specific IP out of many I need my positive Lookahead with (?=\...)

so the src_ip regex is therefore: 

(?P<src_ip>\d*\.\d*\.\d*\.\d*)(?=\ \d* TCP_)

this makes sure that I capture the ip which has some numbers after a blank and then a TCP_ after another blank space.

to capture many different groups I just had to account for the blank spaces between the different values with \s and then start with my capture group again. : 

(?P<src_ip>\d*\.\d*\.\d*\.\d*)(?=\ \d* TCP_)\s(?P<bits>\d*)\s(?P<tcp_state>\w*_\w*)

 

View solution in original post

0 Karma

avoelk
Communicator

I got my solution tha works: 

 

so first, when I want to capture a specific IP out of many I need my positive Lookahead with (?=\...)

so the src_ip regex is therefore: 

(?P<src_ip>\d*\.\d*\.\d*\.\d*)(?=\ \d* TCP_)

this makes sure that I capture the ip which has some numbers after a blank and then a TCP_ after another blank space.

to capture many different groups I just had to account for the blank spaces between the different values with \s and then start with my capture group again. : 

(?P<src_ip>\d*\.\d*\.\d*\.\d*)(?=\ \d* TCP_)\s(?P<bits>\d*)\s(?P<tcp_state>\w*_\w*)

 

0 Karma
Get Updates on the Splunk Community!

Enterprise Security Content Update (ESCU) | New Releases

In November, the Splunk Threat Research Team had one release of new security content via the Enterprise ...

Index This | Divide 100 by half. What do you get?

November 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with this ...

Stay Connected: Your Guide to December Tech Talks, Office Hours, and Webinars!

❄️ Celebrate the season with our December lineup of Community Office Hours, Tech Talks, and Webinars! ...