Hi,
i am very new to Splunk and a total greenhorn in regex. I have a log file with the following format
Jul 31 12:23:32 BALTHAZAR squid[7415]: 1375237412.537 93 10.110.40.144 TCP_MISS/200 1214 GET somewebsite ftropea FIRST_UP_PARENT/content1 application/x-javascript
Jul 30 23:59:13 BALTHAZAR squid[7415]: 1375192753.517 0 10.110.40.113 TCP_DENIED/407 3646 GET somewebsite - NONE/- text/html
it is a firewall/proxy access.log and when i import the data I choose access.log as type, then I need to customize since splunk gets just the Date/Time part correct and treats the whole rest as event.
I would like to extract the following fields:
Date = Jul 31 12:23:32
Servername = BALTHAZAR
IP = 10.110.40.144
Code= TCP_MISS/200
RequestType = GET
Website = somewebsite includes the http://
User = ftropea
I also have to note that the username is sometimes empty and sometimes filled out.
I used the inbuilt field extractor and could extract almost all of the fields above except the User.
What i got until now is something like
(?:[^ \n]* ){3}(?P<_Servername_>[^ ]+)[^\.\n]*\.\d+\s+\d+\s+(?P<_IP_>[^ ]+)\s+(?P<_Code_>[^ ]+)\s+\d+\s+(?P<_RequestType_>[^ ]+)\s+(?P<_Website_>[^ ]+)
and I think even that this is not correct... any idea what i could do?
Or do i have to write my own app/plugin and write a parser (php/c# or whatever) for this file?
... View more