I intend modify my app/script so that it will write out a completely custom log file format for Splunk to monitor and index in real-time.
What is the best, most optimal format to use for my custom log event such that Splunk automatically extracts ALL of my fields and the timestamp and I do not have to setup or configure any field extractions myself.
The optimal log format is -
timestamp key=value key=value key=value key=value key=value key=value key=value key=value
You can have other delimiters in there too like , or : but that's pretty much a personal preference. If the keys and values are easily recognizable, Splunk will index and search as fast as you can write it out.
I use:
key="value" || key="value" || key="value"
My props.conf looks like this:
[my_sourcetype]
KV_MODE = none
REPORT-event = my_sourcetype_event
My transforms.conf looks like this:
[my_sourcetype_event]
MV_ADD = true
KEEP_EMPTY_VALS = true
REGEX = ([^=(\s+\|\|\s+)]*?)\s*\=\s*(.)((?:[^\2]|[^=])*?)\2+?(?:\s+\|\|\s+|$)
FORMAT = $1::$3
My events look like this:
timestamp="2012-02-24 17:39:19 -0800 (PST)" || type="php" || message="my message" || variables_type="Warning" || variables_message="blah" || variables_function="sure" || variables_file="file.php" || variables_line="958" || severity="error" || user_uid="1212" || user_language="fr" || user_ctry_cd="AX" || user_name="nada" || user_init="124124" || user_is_employee="no" || request_uri="http://foo.com/sure" || referer="http://bar.com/foo" || ip="10.10.10.10" || message_id="6"
The regex I made is pretty cool. It'll let you do:
key=[any character]valuebla[any character]hvalue[any character] ||
For example:
dog="spot" ||
alien='zonk' ||
fruit=^apple^ ||
broken=#not#brok#en# ||
horriblekey="imnothorrible="yesyouare" ishouldbemyownfield="wellyouwont" i="give="up""" ||
Your transforms.conf worked amazing for me. All I had to do was format my source events like yours. Thank you!
There are several ways to deal with the Sql_Text=Select * from Table1 where uname="dummy"
One way which will work if the Sql_Text=something is at the end of a log event is to use filed extractions (i.e. EXTRACT) in the props.conf file:
EXTRACT-Sql_Text = Sql_Text=(?
.+)$
You could even do this directly in the search app without using the props.conf stuff. The following should give you a list with the count of the 10 most used Sql_Text expression grouped by the ClientIP field:
* | rex field=_raw " Sql_Text=(?<SqlText>.+)$" | stats count ClientIP, SqlText | sort 10 -count
What if you want log sql commands like this:
Example:
May 26 18:14:15 myhostname DBIP=10.5.10.2 Service=OracleXE ClientIP=75.149.38.65 SrcPort=80 DestPort=8080 UID=10534 Sql_Text=Select * from Table1 where uname="dummy"
As you can see timestamp key=value key=value key=value ... in this example is not good and , or : is not good delimiters because all of this delimiters can be in sql commands which cause broken extract fields.
Something like this:
Generic Example:
[Timestamp] Hostname HostIP=IPaddress Service=ServiceName ClientIP=IPaddress SrcPor=port# DestPort=port# UID=value Stuff=blah Morestuff=blahblah
Specific Example:
May 26 18:14:15 myhostname HostIP=10.5.10.2 Service=CustomLogger ClientIP=75.149.38.65 SrcPort=80 DestPort=8080 UID=10534 ImportantValue=Be9r87 AnotherImportantValue=310984
Hello Mick. Could you share a log format example? What is the timestamp format?
The optimal log format is -
timestamp key=value key=value key=value key=value key=value key=value key=value key=value
You can have other delimiters in there too like , or : but that's pretty much a personal preference. If the keys and values are easily recognizable, Splunk will index and search as fast as you can write it out.
The time stamp should be in ISO8601 form - i.e. variants of YYYY-MM-DD HH:MM:SS.mmm TZ DST.
Example: 2011-10-24 14:04:02 +0200 DST
If you do not want (or need) the time zone of Daylight Savings Time designators - these may be omitted.