Splunk Search
Highlighted

How can I extract fields from a space delimited event with potential spaces in the field values?

Communicator

How would I go along extracting fields for the below? The challenge I am seeing is that it seems to be delimited by space, but the values themselves can contain a space. For example, the header datatime has space, and the user agent has spaces (though the latter has quotes around it).

What would be the best approach for extracting fields from this data?

Aug 27 17:48:19 10.252.22.22 Aug 27 10:46:48 10.251.106.44 2015-08-27 17:35:43 19 10.234.37.191 - - - OBSERVED "News/Media" http://bits.blogs.nytimes.com/2015/08/26/facebook-tests-a-digital-assistant-for-its-messaging-app/?_...  200 TCP_HIT GET image/jpeg http graphics8.nytimes.com 80 /images/2015/08/28/business/28eugoogle-web/28eugoogle-web-mediumThreeByTwo210.jpg - jpg "Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.2403.157 Safari/537.36" 10.251.106.44 8762 4053 - "none" "none"
0 Karma
Highlighted

Re: How can I extract fields from a space delimited event with potential spaces in the field values?

Builder

What do you think about using the space as field separator and after discover all, group some fields in eventtypes for example? Also you can use eval functions.

Use eventtypes

Group using eval

0 Karma
Highlighted

Re: How can I extract fields from a space delimited event with potential spaces in the field values?

Legend

A field definition is ultimately a regular expression. You can certainly write a regular expression that would include spaces - or anything else! Of course, for a complicated event, the regular expressions may be complex as well.

You might be able to avoid writing your own regular expression if your data is one of the pretrained sourcetypes, or if there is an app for the data.

The timestamp is a special case. Splunk's default timestamp extraction is not confused by spaces, although it might have some problem with the fact that there are 3 timestamps in the event! Which one is the event time? Again, you can use regular expressions to help Splunk identify the proper time stamp; here is some info in the documentation.

I frankly think that "grouping fields" on the fly is an inconvenient way to do things. Remember that field extractions are dynamic - you can change them at any time. So even if you have already indexed the data, you can change the field definitions. [Exception: unless you used "index time" field extractions - which you should avoid as much as possible.]

If you need help writing the regular expressions, tell us exactly how you want the fields broken out in this event...

Highlighted

Re: How can I extract fields from a space delimited event with potential spaces in the field values?

New Member

11/06/2018 01:31:21.784 (# 178) (58w8239-11212-2001-0078-00999393003903) Director (Director, 63) 1

I need to get (5***) as a field in the above log

0 Karma