Splunk Search

How to map values from log files with no key-value structure to custom fields?

ateterine
Path Finder

Hi Splunkers,
I have a number of log files which do not have key:value structure to them. How do I map those values to custom fields?
Here is an example:

2014-09-07  18:57:10    111.123.127.117 GET /url_value_goes_here/7185520.ts 200 6971895 2425    "-" "Player/12.00.13411.0000 WMFSDK/12.00.13411.0000"   "-"

Fields should be this:

date time cs-ip cs-method cs-uri sc-status sc-bytes time-taken cs(Referer) cs(User-Agent) cs(Cookie)

Thank you!

Tags (3)
0 Karma

martin_mueller
SplunkTrust
SplunkTrust

That looks a lot like an access log, but maybe not quite - first, check if any of the predefined access log sourcetypes happens to match this.

If not, you'd define the timestamp extraction in props.conf / the data preview and regular expression field extractions in props.conf / in the UI under Settings -> Fields. Without knowing the particulars of your data, it'd look something like this:

[your_sourcetype]
TIME_PREFIX = ^
MAX_TIMESTAMP_LOOKAHEAD = 25
TIME_FORMAT = %Y-%m-%d  %H:%M:%S
EXTRACT-fields = \d\d:\d\d:\d\d\s+(?<cs-ip>\S+)\s+(?<cs-method>\S+)... and so on.
other keys here such as lookups, transforms, etc.
0 Karma

martin_mueller
SplunkTrust
SplunkTrust

I see. Use tools such as http://regexr.com to test-drive your expressions while learning. Do remember though that doesn't support naming capturing groups, so you'll have to leave those out there and add them in before doing the extraction in Splunk.

Based on that single event, I'd use something like this:

EXTRACT-fields = \d\d:\d\d:\d\d\s+(?<cs_ip>\S+)\s+(?<cs_method>\S+)\s+(?<cs_uri>\S+)\s+(?<sc_status>\d+)\s+(?<sc_bytes>\d+)\s+(?<time_taken>\d+)\s+"(?<cs_referer>[^"]*)"\s+"(?<cs_useragent>[^"]*)"\s+"(?<cs_cookie>[^"]*)"

Note, I've made some assumptions about the characters that can or cannot appear in a field. They may or may not be correct for your entire set of data... great thing about Splunk, you can define the field extraction and test it, then change it if it's not perfect yet because the extraction happens at search time, "schema on the fly".
Note also, I've first renamed the last few fields to not have parenthesis in field names and all the fields to not have the minus sign in field names. Try only to have letters, digits, and underscores - else you end up with trouble trying to use a field "foo-bar" that looks like "substract bar from foo" to an eval command.

0 Karma

ateterine
Path Finder

Thank you martin_mueller
It is somewhat of a access log, I'm not 100% sure the exact format, all I have is gigabytes of this data.

I am not famililar with regex expressions and would really really appreciate complete solution on this one.

Thank you

0 Karma
Get Updates on the Splunk Community!

What's new in Splunk Cloud Platform 9.1.2312?

Hi Splunky people! We are excited to share the newest updates in Splunk Cloud Platform 9.1.2312! Analysts can ...

What’s New in Splunk Security Essentials 3.8.0?

Splunk Security Essentials (SSE) is an app that can amplify the power of your existing Splunk Cloud Platform, ...

Let’s Get You Certified – Vegas-Style at .conf24

Are you ready to level up your Splunk game? Then, let’s get you certified live at .conf24 – our annual user ...