I have an input that contains a JSON log entry from a server but because it comes in via syslog Splunk cannot decipher the JSON part of it. I know Splunk is happy just reading in JSON but this infomation needs to come in via syslog so it is prepended with the usual timestamp preamble.
My approach to resolving this was to transform the syslog data from the host machine and regex the bit I need into the index.
An example of the syslog entry is:
<142>Feb 26 13:44:46 localhost node-3: [SyslogManagerImpl] <> INFO uvm[0]: {"serverIntf":1,"timeStamp":"2013-02-26 13:44:46.178","SClientAddr":"/88.88.8.188","sessionId":89248860551403,"tag":"uvm[0]: ","SServerPort":80,"SServerAddr":"/222.22.22.122","class":"class com.untangle.uvm.node.SessionNatEvent","SClientPort":18220}
So I used this regex and when tested outside of Splunk works fine and extracts the correct data:
^(?:.*)({\".*)$
This finds the first {" and then returns it in field 2.
I have however been unsuccessful in getting this to work in Splunk, it either ignores the entry completely and gives me an empty log or it just gives me the original log entry, no filtering! I've tried various queues in the transform and also grabbing the TCP port directly in props but no luck.
Any help or guideance would be greatly welcomed.
My files mods are below:
props.conf
[host::server.domain.com]
NO_BINARY_CHECK = 1
pulldown_type = 1
TRANSFORMS-set=syslogremove
transforms.conf
[syslogremove]
DEST_KEY = indexQueue
SOURCE_KEY = parsingQueue
DEFAULT_VALUE = failed
REPEAT_MATCH = true
WRITE_META = true
REGEX = ^(?:.*)({\".*)$
FORMAT = set::$2
Martin, thanks for your suggestions, I tried the SPATH suggestion but couldn't get that working either, I have however resolved it another way.
Instead of using Splunk's own syslog server I've used rsyslog that is installed on the server used its built in filtering to parse out the JSON from the syslog message. This is written into a file which Splunk then indexes natively as JSON. Result!
Here's the HOWTO:
1) Create a file called /var/rsyslog.d/json.conf with this content:
$Template jsonlog,"%msg:R,ERE,0,BLANK:(\{\".*)--end%\n"
*.info;mail.none;authpriv.none;cron.none /var/log/json.log;jsonlog
2) Restart rsyslog - systemctl restart rsyslog
3) Add an index to Splunk
4) Point an input to that index.
Then you'll have nicely parsed JSON into Splunk via Syslog! If anyone is interested the aim of this exercise was to get Splunk monitoring the logs on our Untangle (http://www.untangle.com) Appliance, so now it is in Splunk we can create an activty monitoring system.
The first query you provided I couldn't get to work, it just returned the original field. The second with the eval did work but I'm not sure how to use this on an index. By the looks of it what you have provided there is what Splunk is doing for me now.
The filtering I've done with rsyslog is a cleaner method for me as splunk automatically performs an spath on the data without modifying any props or transforms.
Have you tried my query? If that doesn't work for you it'd be interesting to see what exactly fails.
Try this:
... | rex "(?<json>\{.+)" | spath input=json | fields - json
You can put the expression into an EXTRACT-classname entry in props.conf, or just add a new inline field extraction through the manger, to get rid of having to call rex every time.
+1 for escaping all that json 🙂
The spath does extract all your data from the JSON object. Consider this query:
| gentimes start=-1 increment=1h | eval tmp = "<142>Feb 26 13:44:46 localhost node-3: [SyslogManagerImpl] <> INFO uvm[0]: {\"serverIntf\":1,\"timeStamp\":\"2013-02-26 13:44:46.178\",\"SClientAddr\":\"/88.88.8.188\",\"sessionId\":89248860551403,\"tag\":\"uvm[0]: \",\"SServerPort\":80,\"SServerAddr\":\"/222.22.22.122\",\"class\":\"class com.untangle.uvm.node.SessionNatEvent\",\"SClientPort\":18220}" | rex field=tmp "(?<json>\{.+)" | spath input=json
You'll see fields such as SClientAddr, SClientPort, etc.
Thanks for your quick response, this solution doesn't pick up the data I need though, all this does is return the original syslog entry for me. Which is the problem I'm hitting with the transform.
I'll have a tinker around a bit more to see if I can get a subset of data to work.