I'm trying to extract multiple fields out of my log. my problem is that I do have multiplie ip adresses - one for the source, one from the webserver etc. so to counter having extracted four ip adresses in every event into the same field I want to use a positive lookahead to tell the regex "yes this ip but only if afterwards there comes x and y"
this is an example log:
2014-03-27 23:54:58 1 10.5.6.121 304 TCP_HIT 422 501 GET http assets.razerzone.com 80 /eeimages/products/13785/razer-naga-2014-right-03.png - - - - 54.230.18.168 image/png http://imgur.com/gallery/u3o7l "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Iron/31.0.1700.0 Chrome/31.0.1700.0 Safari/537.36" OBSERVED "Technology/Internet;Shopping" - 163.252.254.203 - 54351
2014-03-28 23:54:59 90670 10.62.0.120 200 TCP_NC_MISS 601 693 GET http realtime.services.disqus.com 80 /api/2/thread/2221828111 ?bust=1780 - - - realtime.services.disqus.com application/json http://disqus.com/embed/comments/?base=default&disqus_version=6c05c0ca&f=bootsnipp&t_i=9WgD&t_u=http%3A%2F%2Fbootsnipp.com%2Fsnippets%2Ffeatured%2Fminimal-preview-thumbnails&t_d=Viewing%20snippet%20Minimal%20Preview%20Thumbnails%20%7C%20Bootsnipp.com&t_t=Viewing%20snippet%20Minimal%20Preview%20Thumbnails%20%7C%20Bootsnipp.com&s_o=default "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/33.0.1750.154 Safari/537.36" OBSERVED "Newsgroups/Forums" - 163.252.254.203 184.173.90.195 52401
here I want for example extract the source_ip (the first ip in the event) like this:
(\d+\.\d+\.\d+\.\d+)[[:blank:]](?=\d*)
still this gives me many other ips... so I think I have to use multiple lookaheads? but I really don't know how to use it in that case. Especially when I want to extract the ip and give it a field name. In my head this means that I have to encapsulate EVERYTHING including the lookaheads ?
thanks a lot for your help!
I got my solution tha works:
so first, when I want to capture a specific IP out of many I need my positive Lookahead with (?=\...)
so the src_ip regex is therefore:
(?P<src_ip>\d*\.\d*\.\d*\.\d*)(?=\ \d* TCP_)
this makes sure that I capture the ip which has some numbers after a blank and then a TCP_ after another blank space.
to capture many different groups I just had to account for the blank spaces between the different values with \s and then start with my capture group again. :
(?P<src_ip>\d*\.\d*\.\d*\.\d*)(?=\ \d* TCP_)\s(?P<bits>\d*)\s(?P<tcp_state>\w*_\w*)
I got my solution tha works:
so first, when I want to capture a specific IP out of many I need my positive Lookahead with (?=\...)
so the src_ip regex is therefore:
(?P<src_ip>\d*\.\d*\.\d*\.\d*)(?=\ \d* TCP_)
this makes sure that I capture the ip which has some numbers after a blank and then a TCP_ after another blank space.
to capture many different groups I just had to account for the blank spaces between the different values with \s and then start with my capture group again. :
(?P<src_ip>\d*\.\d*\.\d*\.\d*)(?=\ \d* TCP_)\s(?P<bits>\d*)\s(?P<tcp_state>\w*_\w*)