Splunk Search

How to edit my single regex for parsing multiple types of events in the same sourcetype?

pgadhari
Builder

Hi All,

I want a single regex for multiple types of events getting generated in my access logs. I have written the following regex for extracting fields from my access.log :

^(?P[^ ]+)\s+(?P[^ ]+)\s+(?P[^ ]+)\s+\[(?P[^\]]+)[^ \n]*\s+url="(?P[^ ]+)\s+(?P[^ ]+)\s+(?P[^"]+)"\s+\|status=(?P[^ ]+)\s+\|size=(?P[^ ]+)\s+\|resp_time=(?P[^ ]+)\|\sreferer="(?P[^"]+)"\suser_agent="(?P[^"]+)"

The problem is most of the events are getting matched to this regex, but there are 4 events which are showing as "non-matches". Both of my events are :

Matching :

10.0.0.76 - - [20/Apr/2016:16:41:50 +0000] url="GET /dh/en-US/account/login HTTP/1.1" |status=302 |size=323 |resp_time=176| referer="-" user_agent="Python-httplib2/0.7.0 (gzip)"

Non matching is :

37.28.152.58 - - [19/Apr/2016:20:51:58 +0000] url="myversion|3.6 Public" |status=400 |size=312 |resp_time=125| referer="-" user_agent="-" 

106.184.4.52 - - [17/Apr/2016:18:19:27 +0000] url="SSH-2.0-LYGhost_1.2.7-20100630" |status=302 |size=299 |resp_time=129| referer="-" user_agent="-"

In the above non-matching event - fields http_method & protocol are not there. Is there a way to write some conditions in the regex so that using the above regex should work with both the events? Please help.

Thanks
PG

0 Karma

lguinn2
Legend

You can have multiple field extractions for the same sourcetype, no problem. For each event, Splunk will attempt to apply all the regular expressions, and will use all of them that match.

Also, regular expressions in Splunk are unanchored.

Finally if you really want to use such a complex regular expression, I suggest that you use a regular expression tool to test it thoroughly.

You might also want to read the manual entries for creating and maintaining search time field etxractions.

lguinn2
Legend

Why must this be a single regex?

This is hard to understand, and that makes it fragile and hard to maintain - not to mention hard to get right in the first place!

0 Karma

pgadhari
Builder

So how this can be achieved. Can you please guide me. Actually both the events are from the same sourcetype. So how can we write different regex for different events ? and apply it ?

0 Karma

sundareshr
Legend

It will also make it hard to troubleshoot.

0 Karma

pgadhari
Builder

somehow the regex I copied above is not showing capture group name. Pasting the regex again :

^(?P[^ ]+)\s+(?P[^ ]+)\s+(?P[^ ]+)\s+\[(?P[^\]]+)[^ \n]*\s+url="(?P[^ ]+)\s+(?P[^ ]+)\s+(?P[^"]+)"\s+\|status=(?P[^ ]+)\s+\|size=(?P[^ ]+)\s+\|resp_time=(?P[^ ]+)\|\sreferer="(?P[^"]+)"\suser_agent="(?P[^"]+)"\siPlanetDirectoryPro="(?P[^"]+)"
0 Karma
Get Updates on the Splunk Community!

What You Read The Most: Splunk Lantern’s Most Popular Articles!

Splunk Lantern is a Splunk customer success center that provides advice from Splunk experts on valuable data ...

See your relevant APM services, dashboards, and alerts in one place with the updated ...

As a Splunk Observability user, you have a lot of data you have to manage, prioritize, and troubleshoot on a ...

Index This | What goes away as soon as you talk about it?

May 2025 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with this month’s ...