I have a number of log files which do not have key:value structure to them. How do I map those values to custom fields?
Here is an example:
2014-09-07 18:57:10 220.127.116.11 GET /url_value_goes_here/7185520.ts 200 6971895 2425 "-" "Player/12.00.13411.0000 WMFSDK/12.00.13411.0000" "-"
Fields should be this:
date time cs-ip cs-method cs-uri sc-status sc-bytes time-taken cs(Referer) cs(User-Agent) cs(Cookie)
That looks a lot like an access log, but maybe not quite - first, check if any of the predefined access log sourcetypes happens to match this.
If not, you'd define the timestamp extraction in props.conf / the data preview and regular expression field extractions in props.conf / in the UI under Settings -> Fields. Without knowing the particulars of your data, it'd look something like this:
[your_sourcetype] TIME_PREFIX = ^ MAX_TIMESTAMP_LOOKAHEAD = 25 TIME_FORMAT = %Y-%m-%d %H:%M:%S EXTRACT-fields = \d\d:\d\d:\d\d\s+(?<cs-ip>\S+)\s+(?<cs-method>\S+)... and so on. other keys here such as lookups, transforms, etc.
Thank you martin_mueller
It is somewhat of a access log, I'm not 100% sure the exact format, all I have is gigabytes of this data.
I am not famililar with regex expressions and would really really appreciate complete solution on this one.
I see. Use tools such as http://regexr.com to test-drive your expressions while learning. Do remember though that doesn't support naming capturing groups, so you'll have to leave those out there and add them in before doing the extraction in Splunk.
Based on that single event, I'd use something like this:
EXTRACT-fields = \d\d:\d\d:\d\d\s+(?<cs_ip>\S+)\s+(?<cs_method>\S+)\s+(?<cs_uri>\S+)\s+(?<sc_status>\d+)\s+(?<sc_bytes>\d+)\s+(?<time_taken>\d+)\s+"(?<cs_referer>[^"]*)"\s+"(?<cs_useragent>[^"]*)"\s+"(?<cs_cookie>[^"]*)"
Note, I've made some assumptions about the characters that can or cannot appear in a field. They may or may not be correct for your entire set of data... great thing about Splunk, you can define the field extraction and test it, then change it if it's not perfect yet because the extraction happens at search time, "schema on the fly".
Note also, I've first renamed the last few fields to not have parenthesis in field names and all the fields to not have the minus sign in field names. Try only to have letters, digits, and underscores - else you end up with trouble trying to use a field "foo-bar" that looks like "substract bar from foo" to an