How would I go along extracting fields for the below? The challenge I am seeing is that it seems to be delimited by space, but the values themselves can contain a space. For example, the header datatime has space, and the user agent has spaces (though the latter has quotes around it).
What would be the best approach for extracting fields from this data?
Aug 27 17:48:19 10.252.22.22 Aug 27 10:46:48 10.251.106.44 2015-08-27 17:35:43 19 10.234.37.191 - - - OBSERVED "News/Media" http://bits.blogs.nytimes.com/2015/08/26/facebook-tests-a-digital-assistant-for-its-messaging-app/?_... 200 TCP_HIT GET image/jpeg http graphics8.nytimes.com 80 /images/2015/08/28/business/28eugoogle-web/28eugoogle-web-mediumThreeByTwo210.jpg - jpg "Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.2403.157 Safari/537.36" 10.251.106.44 8762 4053 - "none" "none"
A field definition is ultimately a regular expression. You can certainly write a regular expression that would include spaces - or anything else! Of course, for a complicated event, the regular expressions may be complex as well.
The timestamp is a special case. Splunk's default timestamp extraction is not confused by spaces, although it might have some problem with the fact that there are 3 timestamps in the event! Which one is the event time? Again, you can use regular expressions to help Splunk identify the proper time stamp; here is some info in the documentation.
I frankly think that "grouping fields" on the fly is an inconvenient way to do things. Remember that field extractions are dynamic - you can change them at any time. So even if you have already indexed the data, you can change the field definitions. [Exception: unless you used "index time" field extractions - which you should avoid as much as possible.]
If you need help writing the regular expressions, tell us exactly how you want the fields broken out in this event...