Getting Data In

extract a specific IP with positive lookahead

avoelk
Communicator

I'm trying to extract multiple fields out of my log. my problem is that I do have multiplie ip adresses - one for the source, one from the webserver etc. so to counter having extracted four ip adresses in every event into the same field I want to use a positive lookahead to tell the regex "yes this ip but only if afterwards there comes x and y"

this is an example log:

 

2014-03-27 23:54:58 1 10.5.6.121 304 TCP_HIT 422 501 GET http assets.razerzone.com 80 /eeimages/products/13785/razer-naga-2014-right-03.png - - - - 54.230.18.168 image/png http://imgur.com/gallery/u3o7l "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Iron/31.0.1700.0 Chrome/31.0.1700.0 Safari/537.36" OBSERVED "Technology/Internet;Shopping" - 163.252.254.203 - 54351
2014-03-28 23:54:59 90670 10.62.0.120 200 TCP_NC_MISS 601 693 GET http realtime.services.disqus.com 80 /api/2/thread/2221828111 ?bust=1780 - - - realtime.services.disqus.com application/json http://disqus.com/embed/comments/?base=default&disqus_version=6c05c0ca&f=bootsnipp&t_i=9WgD&t_u=http%3A%2F%2Fbootsnipp.com%2Fsnippets%2Ffeatured%2Fminimal-preview-thumbnails&t_d=Viewing%20snippet%20Minimal%20Preview%20Thumbnails%20%7C%20Bootsnipp.com&t_t=Viewing%20snippet%20Minimal%20Preview%20Thumbnails%20%7C%20Bootsnipp.com&s_o=default "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/33.0.1750.154 Safari/537.36" OBSERVED "Newsgroups/Forums" - 163.252.254.203 184.173.90.195 52401

 

 

here I want  for example extract the source_ip (the first ip in the event) like this:

 

(\d+\.\d+\.\d+\.\d+)[[:blank:]](?=\d*)

 

 still this gives me many other ips... so I think I have to use multiple lookaheads? but I really don't know how to use it in that case. Especially when I want to extract the ip and give it a field name. In my head this means that I have to encapsulate EVERYTHING including the lookaheads ? 

 

thanks a lot for your help!

Labels (3)
0 Karma
1 Solution

avoelk
Communicator

I got my solution tha works: 

 

so first, when I want to capture a specific IP out of many I need my positive Lookahead with (?=\...)

so the src_ip regex is therefore: 

(?P<src_ip>\d*\.\d*\.\d*\.\d*)(?=\ \d* TCP_)

this makes sure that I capture the ip which has some numbers after a blank and then a TCP_ after another blank space.

to capture many different groups I just had to account for the blank spaces between the different values with \s and then start with my capture group again. : 

(?P<src_ip>\d*\.\d*\.\d*\.\d*)(?=\ \d* TCP_)\s(?P<bits>\d*)\s(?P<tcp_state>\w*_\w*)

 

View solution in original post

0 Karma

avoelk
Communicator

I got my solution tha works: 

 

so first, when I want to capture a specific IP out of many I need my positive Lookahead with (?=\...)

so the src_ip regex is therefore: 

(?P<src_ip>\d*\.\d*\.\d*\.\d*)(?=\ \d* TCP_)

this makes sure that I capture the ip which has some numbers after a blank and then a TCP_ after another blank space.

to capture many different groups I just had to account for the blank spaces between the different values with \s and then start with my capture group again. : 

(?P<src_ip>\d*\.\d*\.\d*\.\d*)(?=\ \d* TCP_)\s(?P<bits>\d*)\s(?P<tcp_state>\w*_\w*)

 

0 Karma
Get Updates on the Splunk Community!

Registration for Splunk University is Now Open!

Are you ready for an adventure in learning?   Brace yourselves because Splunk University is back, and it's ...

Splunkbase | Splunk Dashboard Examples App for SimpleXML End of Life

The Splunk Dashboard Examples App for SimpleXML will reach end of support on Dec 19, 2024, after which no new ...

Understanding Generative AI Techniques and Their Application in Cybersecurity

Watch On-Demand Artificial intelligence is the talk of the town nowadays, with industries of all kinds ...