Splunk Search

Field Extractions from Proxy Logs

newbie2tech
Communicator

Hi Team,

Need your help/suggestion on what is the best way to handle below scenario.

I am using field extractor screen from search head GUI to extract fields from below proxy log patterns. For example am using below pattern and extracting below 5 fields[filed1 to filed5]

[06/Sep/2017:17:02:29] failure (13878): for host 141.139.7.270 trying to GET /ba_staticfiles/LeakingNews/LrnJsonRsp.txt, service-http reports: COBRE7740: unable to contact wa01.abc.xyz.com:9102 (Directory lookup error)

field1 -141.139.7.270
field2 -/ba_staticfiles/LeakingNews/LrnJsonRsp
field3 -COBRE7740
field4 -wa01.abc.xyz.com:9102
field5 -Directory lookup error

Challenge that i am running into is that not all events in the proxy logs are of same length , some log events have all the field information while others do not have them,also if one of the fields is missing in the event none of the fields get extracted.Due to this when i do stats on any of the extracted fields i miss out on the events[total evets=100 , my stats show only 70]

Below are my possible proxy log statements [3rd and 4th event is same as 1 and 2 except that it has additional space after (]

Patterns

1st event ->[06/Sep/2017:16:38:23] security (13878): for host 141.139.7.270 trying to GET /favicon.ico, deny-service reports: denying service of /favicon.icon
2nd event ->[06/Sep/2017:17:02:29] failure (13878): for host 141.139.7.270 trying to GET /ba_staticfiles/LeakingNews/LrnJsonRsp.txt, service-http reports: COBRE7740: unable to contact wa01.abc.xyz.com:9102 (Directory lookup error)
3rd event ->[06/Sep/2017:16:38:23] security ( 13878): for host 141.139.7.270 trying to GET /favicon.ico, deny-service reports: denying service of /favicon.icon
4th event ->[06/Sep/2017:17:02:29] failure ( 13878): for host 141.139.7.270 trying to GET /ba_staticfiles/LeakingNews/LrnJsonRsp.txt, service-http reports: COBRE7740: unable to contact wa01.abc.xyz.com:9102 (Directory lookup error)

Any suggestion on best way to handle them [ use regex to handle it during indexing by modifying props/tranforms.conf] or handle it through field extraction in the GUI. If using GUI how this can be achieved using field extraction OR regex from the search head, or if there is another better way.

Also is there a dedicated inbuilt sourcetype for proxy error logs [ i am using access_combined sourcetype]

Thank you in advance.

Splunk Version - 6.5.1

0 Karma
1 Solution

cpetterborg
SplunkTrust
SplunkTrust

Doing this in props/transforms is more difficult. I'd do it at search time in an auto field extraction or in your SPL search with rex.

The following rex will work on your 4 events that you provided above:

... | rex "for host (?<ip>\d+\.\d+\.\d+\.\d+).*GET (?P<url>[^,]+),(.*reports: (?P<thing>\S+):.*?(?P<contact>[\w.]+:\d+).*\((?P<msg>[^\)]+)+\))?"

Sorry about the names, but I didn't want to just call them field1, field2, etc. If you have other formats that you need fields extracted from, or there are some events that seem to not work properly with this, add some examples here. Some fields are rather explicit, like the IP is an IP, not a hostname. The contact contains a domainname and port.

View solution in original post

0 Karma

gokadroid
Motivator

If the end goal is to count all the events then it shall be done on a field which occurs in all the events, something like field1 IP to get all the events.
Ideal approach per my opinion shall be, filter events of "deny-service reports" and extract the field1 and field2. And filter events on "service-http reports" and extract all 5 fields. However if all need to be extracted in one extraction, below is what can be tried which shall assign "some value" when that field is absent from the event, and the required value when field is present in the event.

your query to get events
| rex "for host (?<ipAddress>(\d{1,3}\.)+\d{1,3}).+GET\s*(?<uri>[^,]+),\s*(deny-service reports:|service-http reports:)\s*(?<keycode>[^\s]+)(unable to contact|\s)*(?<backendHost>[^\s]+)\s*\(*(?<error>[^\)]+)*"
| table ipAdderss, uri, keycode, backendHost, error

Extraction is here

newbie2tech
Communicator

thank you for the response, it helped

0 Karma

cpetterborg
SplunkTrust
SplunkTrust

Doing this in props/transforms is more difficult. I'd do it at search time in an auto field extraction or in your SPL search with rex.

The following rex will work on your 4 events that you provided above:

... | rex "for host (?<ip>\d+\.\d+\.\d+\.\d+).*GET (?P<url>[^,]+),(.*reports: (?P<thing>\S+):.*?(?P<contact>[\w.]+:\d+).*\((?P<msg>[^\)]+)+\))?"

Sorry about the names, but I didn't want to just call them field1, field2, etc. If you have other formats that you need fields extracted from, or there are some events that seem to not work properly with this, add some examples here. Some fields are rather explicit, like the IP is an IP, not a hostname. The contact contains a domainname and port.

0 Karma

newbie2tech
Communicator

thank you cpetterborg..this worked great!! will revert back if I notice any further issues

0 Karma
Get Updates on the Splunk Community!

Federated Search for Amazon S3 | Key Use Cases to Streamline Compliance Workflows

Modern business operations are supported by data compliance. As regulations evolve, organizations must ...

New Dates, New City: Save the Date for .conf25!

Wake up, babe! New .conf25 dates AND location just dropped!! That's right, this year, .conf25 is taking place ...

Introduction to Splunk Observability Cloud - Building a Resilient Hybrid Cloud

Introduction to Splunk Observability Cloud - Building a Resilient Hybrid Cloud  In today’s fast-paced digital ...