Getting Data In

Transforms.conf SOURCE_KEY Questions

Engager

I run HAProxy and grab it via a universal forwarder and send it to our receiver/indexer (all on same host).
I modified my props.conf as follows.

props.conf
[source::/var/log/*haproxy.log]
TRANSFORMS-syslogstripper = haproxy_syslog_stripper, haproxyfields, clientinfofields, backendfields, requestinfo, connectioninfo, queueinfo, uriinfo

[sourcetype::HAProxy]
MAX_TIMESTAMP_LOOKAHEAD=40
NO_BINARY_CHECK=1
SHOULD_LINEMERGE=false
TZ = US/Mountain
REPORT-haproxyfieldextract = haproxyfields, clientinfofields, backendfields, requestinfo, connectioninfo, queueinfo, uriinfo
TRANSFORMS-haproxystuff = haproxyfields

Here is my transforms.conf where I listed pertinent HAProxy info

transforms.conf

# This will strip the syslog header (date stamp and host) from a syslog event
[haproxy_syslog_stripper]
REGEX         = ^[A-Z][a-z]+\s+\d+\s\d+:\d+:\d+\s[^\s]*\s(.*)$
FORMAT        = $1
DEST_KEY      = _raw

# Transform for HAProxy

[haproxyfields]
DELIMS = " "
FIELDS = haproxy_id,client_info, date_time,frontend_name,backend,request_info,status_code,response_size,val1,val2,flags,connection_info,queue_info,req_header,resp_header,method,uri_info
CLEAN_KEYS=true

#the following is used to extract values from the previous extraction
[clientinfofields]
SOURCE_KEY=client_info
DELIM = ":"
FIELDS = client_ip,client_port
[backendfields]
SOURCE_KEY=backend
DELIM = "/"
FIELDS = backend_name,server_name
[requestinfo]
SOURCE_KEY=request_info
DELIM= "/"
FIELDS=request_time,queue_time,connection_time,response_time,total_time
[connectioninfo]
SOURCE_KEY=connection_info
DELIM= "/"
FIELDS=process_connections,frontend_connections,backend_connections,server_connections,retries
[queueinfo]
SOURCE_KEY=queue_info
DELIM= "/"
FIELDS=server_queue_size,backend_queue_size

#You can still use regex on those extraction that still need it.
[uriinfo]
SOURCE_KEY=uri_info
REGEX=(?[^"]+?)

I am able to get the fields listed in haproxyfields stanza to extract using this search term:

sourcetype="HAProxy" | extract haproxyfields

That works great and I am super excited by that.
The problem is that I have no idea how I would then get the fields that depend on the previously extracted stuff on my haproxyfields stanza to display as well (i.e. client_ip, client_port, backend_name, server_name, etc.)
Any ideas why those fields wouldn't just be extracted along with the "haproxyfields"?

0 Karma
1 Solution

Splunk Employee
Splunk Employee

A hybrid approach seems to work. Too many special characters to escape, so I posted an image for props.conf.

transforms.conf:

[tmf_fields]
DELIMS=" "
FIELDS = month, day, day1, time1, source_ip, haproxy_id, client_info, date_time, frontend_name, backend, request_info, status_code, response_size, val1, val2, flags, connection_info

props.conf

alt text

View solution in original post

0 Karma

Splunk Employee
Splunk Employee

A hybrid approach seems to work. Too many special characters to escape, so I posted an image for props.conf.

transforms.conf:

[tmf_fields]
DELIMS=" "
FIELDS = month, day, day1, time1, source_ip, haproxy_id, client_info, date_time, frontend_name, backend, request_info, status_code, response_size, val1, val2, flags, connection_info

props.conf

alt text

View solution in original post

0 Karma

Engager

This seems to have worked for HAProxy. Keep in mind that the FIELDS arguments depicted above include the syslog header fields. I personally removed them since I strip the header beforehand.
Thanks Dave!

0 Karma

Engager

I will try to get to this late this week or weekend. I am traveling until Friday so I won't have a lot of time to reconfigure this this week to test this out. Keep the suggestions coming though. I do like the idea of not having to throw | extract into the mix though.

0 Karma

Motivator

There are a couple of things going on in this setup: First, we need to clarify what is happening at index time, and what is happening at search time. It's also important to note that you really can't have extractions dependent on other extractions, as they don't execute in sequence.

Now, first thing I notice is you have index-time transforms being applied to the source stanza, and then timestamp, linemerge, and TZ fields being applied by sourcetype. While they should get mashed together correctly, I'd highly recommend getting them in the stanza if possible.

[source::/var/log/*haproxy.log]
MAX_TIMESTAMP_LOOKAHEAD=40
NO_BINARY_CHECK=1
SHOULD_LINEMERGE=false
TZ=US/Mountain
SOURCETPYE=HAProxy  (unless this is explicitly set by the forwarder, in which case it's unnecessary, and you can make this entire stanza [HAProxy])
TRANSFORMS-syslogstripper = haproxy_syslog_stripper
EXTRACT-haproxy_fields = haproxy_fields

And then in transforms.conf you'll have the following:

[haproxy_syslog_stripper]
REGEX = ^[A-Z][a-z]+\s+\d+\s\d+:\d+:\d+\s[^\s]*\s(.*)$
FORMAT = $1
DEST_KEY = _raw

[haproxy_fields]
REGEX = SEE BELOW

Now, because you can't have extractions dependent on extractions (the field has to exist at search time, and if it's another search-time extraction, it doesn't) you're going to need a BIG regex to extract all of the fields. Assuming your HAProxy logs follow the this format after your syslog headers are removed...

[06/Feb/2009:12:14:14.655] http-in static/srv1 10/0/30/69/109 200 2750 - - ---- 1/1/1/1/0 0/0 {1wt.eu} {} "GET /index.html HTTP/1.1"

Then you could use something like this: regexr link because formatting gets borked inline

It's really long and not exactly easy to read, but it does pull out all of the fields you're looking for.

End result is the haproxy_syslog_stripper is an index-time extraction that overwrites _raw with it's results. Then haproxy_fields is a search-time extraction based on the updated _raw. One happens when the data is indexed, and the other happens when the data is searched. So in that case they can rely on each other. Bonus gained is that you shouldn't need to use the | extract command to get the fields to appear. They should simply be available when you're searching this sourcetype.

Motivator

Thanks for the clarification! I thought everything in props essentially ran concurrently.

That being the case, this should work in props.conf (either the source:: stanza name or below depening on your incomind data)

[HAProxy] 
MAX_TIMESTAMP_LOOKAHEAD=40
NO_BINARY_CHECK=1
SHOULD_LINEMERGE=false
TZ = US/Mountain
TRANSFORMS-syslogstripper = haproxy_syslog_stripper
REPORT-haproxyfieldextract = haproxyfields, clientinfofields, backendfields, requestinfo, connectioninfo, queueinfo, uriinfo

If this doesn't work, can you provide example logs?

0 Karma

Legend

A comment regarding the statement - "It's also important to note that you really can't have extractions dependent on other extractions, as they don't execute in sequence."

This is not true - in fact, the opposite holds and is used by many apps, including apps made by Splunk themselves. Extractions run in the sequence specified by the order in which they're called in a REPORT statement, so if you have "REPORT = extraction1, extraction2", extraction2 will run after extraction1 and can make use of the field(s) extraction1 created.

Champion

what are you trying to accomplish?

0 Karma
State of Splunk Careers

Access the Splunk Careers Report to see real data that shows how Splunk mastery increases your value and job satisfaction.

Find out what your skills are worth!