Splunk Dev

Can you help me extract fields from apache:access logs?

mrtolu6
Path Finder

Regex Experts!
Need help in extracting src, http_method, uri_path, status field.

Below is an example of a log with the fields that I would like to extract :

"10.10.10.22 - - [12/Oct/2012:14:22:41 -0400] "GET /etc/team/transport/tRoom?serlet=jpsSSGenerator HTTP/1.1" 200 26494"
src=10.10.10.22, http_method=GET, uri_path= /etc/team/transport/Room?serlet=jpsSSGenerator

This is example of different types of logs that comes from apache access logs. I'm looking for a regex that can extract fields from the example below. Thanks in advance for any help.

example logs

127.0.0.1 - - [104/Oct/2018:11:22:47 -0700] "GET /directory/directory/test?seat=ShowPage&tese=calendar.js&IP=444.444.1.444 HTTP/1.1" 304 - "htttps://testwebsite.com/m/see/yup/Union?selpt=stepReportFilter.jsp" "Mozilla/5.0 (Windows NT 6.1; Trident/7.0; rv:11.0) like Gecko"

10.10.10.10 - - [10/Oct/2018:11:22:47 -0700] "POST /nba/nfl/nhl/ufc HTTP/1.1" 200 470 "-" "Mozilla/4.0 (Windows 8.1 6.3) Java/1.2.0_181" "10.10.10.02"

dnsname..cod.blackops.com:80 10.10.10.02 - - [16/Oct/2018:11:22:22 -0700] "GET /scripts/form_registry.js HTTP/1.1" 200 2504 "htttp://10.10.10.03lnba/cruisehtml?&swf_version=ezboard052614_1&serverUrl=110.10.10.03&boardId=19-153970030&isPreview=0&update052109=1" "Mozilla/5.0 (Windows NT 6.1; Trident/2.0; rv:11.0) like Gecko"

10.10.10.02 - - [12/Oct/2018:13:22:41 -0500] "POST /yup/zillow/server.php?a=c7355 HTTP/1.1" 200 - "-" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/515.2 (KHTML, like Gecko) Chrome/15.0.200.200 Safari/535.2"

10.10.10.02 - - [11/Oct/2018:13:22:41 -0500] "POST /yup/zillow/server.php?a=c7355 HTTP/1.1" 200 - 

10.10.10.22 - - [12/Oct/2012:14:22:41 -0400] "GET /etc/team/transport/tRoom?serlet=jpsSSGenerator HTTP/1.1" 200 26494
Tags (1)
0 Karma
1 Solution

sudosplunk
Motivator

Hi @mrtolu6,

Give this regex a try: your base search | rex field=_raw (?<src>\d+\.\d+\.\d+\.\d+).+\]\s\"(?<http_method>\w+)\s(?<uri_path>.+)\"\s(?<status>\d+)

Tested the regex here1.

View solution in original post

adonio
Ultra Champion

why not use the pre-built sourcetype access_combined?

see here:
https://docs.splunk.com/Documentation/Splunk/7.2.0/Data/Listofpretrainedsourcetypes

0 Karma

sudosplunk
Motivator

Hi @mrtolu6,

Give this regex a try: your base search | rex field=_raw (?<src>\d+\.\d+\.\d+\.\d+).+\]\s\"(?<http_method>\w+)\s(?<uri_path>.+)\"\s(?<status>\d+)

Tested the regex here1.

mrtolu6
Path Finder

that worked but it adds extra details in the uri_path fields. If i wanted to created additional fields called uri_query that would create a new field for anything after the "?", also would like to create a version field forhe HTTP1/1 called version.

For example
10.10.10.04 - - [07/Oct/23:08:30:59 -0400] "POST /OndnForm/drag_Form?images/ HTTP/1.1" 400 226 "-" "Hello, People"

the uri_query=images/
version= HTTP/1.1
src=10.10.10.04
status=400
bytes=226

0 Karma

sudosplunk
Motivator

Try this:

your base search | rex field=_raw "(?<src>\d+\.\d+\.\d+\.\d+).+\]\s\"(?<http_method>\w+)\s(?<uri_path>\S+)\s(?<uri_query>\S+)\"\s(?<status>\d+)\s(?<bytes>[\d-]+)"

Updated regex https://regex101.com/r/CpQ56P/2

0 Karma

mrtolu6
Path Finder

thanks for your help!

0 Karma
Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

Index This | What travels the world but is also stuck in place?

April 2026 Edition  Hayyy Splunk Education Enthusiasts and the Eternally Curious!   We’re back with this ...

Discover New Use Cases: Unlock Greater Value from Your Existing Splunk Data

Realizing the full potential of your Splunk investment requires more than just understanding current usage; it ...

Continue Your Journey: Join Session 2 of the Data Management and Federation Bootcamp ...

As data volumes continue to grow and environments become more distributed, managing and optimizing data ...