There are a couple of things going on in this setup: First, we need to clarify what is happening at index time, and what is happening at search time. It's also important to note that you really can't have extractions dependent on other extractions, as they don't execute in sequence.
Now, first thing I notice is you have index-time transforms being applied to the source stanza, and then timestamp, linemerge, and TZ fields being applied by sourcetype. While they should get mashed together correctly, I'd highly recommend getting them in the stanza if possible.
[source::/var/log/*haproxy.log]
MAX_TIMESTAMP_LOOKAHEAD=40
NO_BINARY_CHECK=1
SHOULD_LINEMERGE=false
TZ=US/Mountain
SOURCETPYE=HAProxy (unless this is explicitly set by the forwarder, in which case it's unnecessary, and you can make this entire stanza [HAProxy])
TRANSFORMS-syslogstripper = haproxy_syslog_stripper
EXTRACT-haproxy_fields = haproxy_fields
And then in transforms.conf you'll have the following:
[haproxy_syslog_stripper]
REGEX = ^[A-Z][a-z]+\s+\d+\s\d+:\d+:\d+\s[^\s]*\s(.*)$
FORMAT = $1
DEST_KEY = _raw
[haproxy_fields]
REGEX = SEE BELOW
Now, because you can't have extractions dependent on extractions (the field has to exist at search time, and if it's another search-time extraction, it doesn't) you're going to need a BIG regex to extract all of the fields. Assuming your HAProxy logs follow the this format after your syslog headers are removed...
[06/Feb/2009:12:14:14.655] http-in static/srv1 10/0/30/69/109 200 2750 - - ---- 1/1/1/1/0 0/0 {1wt.eu} {} "GET /index.html HTTP/1.1"
Then you could use something like this: regexr link because formatting gets borked inline
It's really long and not exactly easy to read, but it does pull out all of the fields you're looking for.
End result is the haproxy_syslog_stripper is an index-time extraction that overwrites _raw with it's results. Then haproxy_fields is a search-time extraction based on the updated _raw. One happens when the data is indexed, and the other happens when the data is searched. So in that case they can rely on each other. Bonus gained is that you shouldn't need to use the | extract command to get the fields to appear. They should simply be available when you're searching this sourcetype.
... View more