I've got a basic Splunk setup to consolidate four different Web logs from eight machines running two Web servers each. As a next step, I'd like to integrate in output from an internal error log that's stored centrally in a database. I could write the data out to a log and let a regular Splunk forwarder handle everything. While that's likely to be what I do, I thought I'd ask if there's any sense to writing a custom bit of code to forward the data directly to Splunk from my application without writing to a physical log. I'm new enough to Splunk that I didn't manage to find the relevant docs. I pushed an existing log into Splunks bulk upload folder and traced out the send data and it looks as though:
Depending on how the server is configured, the data can be sent up just as it is in the log (raw) or pre-parsed (splunk-cooked-mode-v2, in my quick test.) I'd use a raw format to keep my code simple.
There are at least a handful of meta-data name-value pairs (or something) at the start of the message, even in raw mode.
At a quick glance, the raw mode format looks like it's mostly a simple stream of data from the log to Splunk.
Can anyone point me to some advice, documentation, blogs, etc. on the pros/cons/details of writing custom code to push log data directly to Splunk over TCP?
Thanks very much. I'm new to Splunk and have already gotten some great answers from Splunkanswers members. Much appreciated!
Something else to consider - don't write custom code unless you have to.
Before you start, find out how the custom error log is being written to the database in the first place. Often, frameworks like log4j or log4net will do this, and you may be able to plug into the existing process.
It could be as simple as modifying a configuration file to enable writing to a TCP or UDP output (e.g., by creating a new RemoteSyslogAppender
or your framework's equivalent). Then, just enable splunk or your syslog daemon to listen on the appropriate port.
Something else to consider - don't write custom code unless you have to.
Before you start, find out how the custom error log is being written to the database in the first place. Often, frameworks like log4j or log4net will do this, and you may be able to plug into the existing process.
It could be as simple as modifying a configuration file to enable writing to a TCP or UDP output (e.g., by creating a new RemoteSyslogAppender
or your framework's equivalent). Then, just enable splunk or your syslog daemon to listen on the appropriate port.
The cooked format is not documented, but the raw format is simply a stream of bytes, exactly as if they were going to a file.
Thanks for the clarification. I'll pass on trying to reverse engineer an undocumented internal format!
Have you looked into scripted inputs at all? http://www.splunk.com/base/Documentation/latest/Admin/Setupcustom(scripted)inputs
Rather than writing the data to a logfile, the Splunk light forwarder can consume the stdout of the script directly and pass that along. My preference would be to avoid writing custom socket communications. What's the point when Splunk can already do it, plus you could use the LWF to consume other logs, etc, as the need presents itself.
Thanks for the suggestions. Yes, writing the socket code directly is probably just asking for trouble. Thanks for the reference to scripted inputs as I hadn't considered them. The environment I'm in for this job doesn't naturally talk to stdin/stdout so I'll do something else, but I like the idea of scripted inputs very much.
In this case, I think I'll have one client connect to the database, set a semaphore that it's processing the error log table and then proceed to push out errors to a physical file. Splunk can monitor it normally.
That is a good point. A very important corollary to this is that the Splunk forwarder implements load-balancing over multiple indexers, which is fundamental to scaling Splunk. You probably don't want to have to write this code. There is a little work on a script to make sure it stays alive and shuts down or restarts at the right times, but that would be true for a standalone process as well.