Getting Data In

Filtering syslog events into JSON events with regex

actech
New Member

I have an input that contains a JSON log entry from a server but because it comes in via syslog Splunk cannot decipher the JSON part of it. I know Splunk is happy just reading in JSON but this infomation needs to come in via syslog so it is prepended with the usual timestamp preamble.

My approach to resolving this was to transform the syslog data from the host machine and regex the bit I need into the index.

An example of the syslog entry is:
<142>Feb 26 13:44:46 localhost node-3: [SyslogManagerImpl] <> INFO uvm[0]: {"serverIntf":1,"timeStamp":"2013-02-26 13:44:46.178","SClientAddr":"/88.88.8.188","sessionId":89248860551403,"tag":"uvm[0]: ","SServerPort":80,"SServerAddr":"/222.22.22.122","class":"class com.untangle.uvm.node.SessionNatEvent","SClientPort":18220}

So I used this regex and when tested outside of Splunk works fine and extracts the correct data:

^(?:.*)({\".*)$

This finds the first {" and then returns it in field 2.

I have however been unsuccessful in getting this to work in Splunk, it either ignores the entry completely and gives me an empty log or it just gives me the original log entry, no filtering! I've tried various queues in the transform and also grabbing the TCP port directly in props but no luck.

Any help or guideance would be greatly welcomed.

My files mods are below:

props.conf

[host::server.domain.com]
NO_BINARY_CHECK = 1
pulldown_type = 1
TRANSFORMS-set=syslogremove

transforms.conf

[syslogremove]
DEST_KEY = indexQueue
SOURCE_KEY = parsingQueue
DEFAULT_VALUE = failed
REPEAT_MATCH = true
WRITE_META = true
REGEX = ^(?:.*)({\".*)$
FORMAT = set::$2
Tags (1)
0 Karma

actech
New Member

Martin, thanks for your suggestions, I tried the SPATH suggestion but couldn't get that working either, I have however resolved it another way.

Instead of using Splunk's own syslog server I've used rsyslog that is installed on the server used its built in filtering to parse out the JSON from the syslog message. This is written into a file which Splunk then indexes natively as JSON. Result!

Here's the HOWTO:
1) Create a file called /var/rsyslog.d/json.conf with this content:

$Template jsonlog,"%msg:R,ERE,0,BLANK:(\{\".*)--end%\n"
*.info;mail.none;authpriv.none;cron.none                /var/log/json.log;jsonlog

2) Restart rsyslog - systemctl restart rsyslog
3) Add an index to Splunk
4) Point an input to that index.

Then you'll have nicely parsed JSON into Splunk via Syslog! If anyone is interested the aim of this exercise was to get Splunk monitoring the logs on our Untangle (http://www.untangle.com) Appliance, so now it is in Splunk we can create an activty monitoring system.

0 Karma

actech
New Member

The first query you provided I couldn't get to work, it just returned the original field. The second with the eval did work but I'm not sure how to use this on an index. By the looks of it what you have provided there is what Splunk is doing for me now.

The filtering I've done with rsyslog is a cleaner method for me as splunk automatically performs an spath on the data without modifying any props or transforms.

0 Karma

martin_mueller
SplunkTrust
SplunkTrust

Have you tried my query? If that doesn't work for you it'd be interesting to see what exactly fails.

0 Karma

martin_mueller
SplunkTrust
SplunkTrust

Try this:

... | rex "(?<json>\{.+)" | spath input=json | fields - json

You can put the expression into an EXTRACT-classname entry in props.conf, or just add a new inline field extraction through the manger, to get rid of having to call rex every time.

jonuwz
Influencer

+1 for escaping all that json 🙂

0 Karma

martin_mueller
SplunkTrust
SplunkTrust

The spath does extract all your data from the JSON object. Consider this query:

| gentimes start=-1 increment=1h | eval tmp = "<142>Feb 26 13:44:46 localhost node-3: [SyslogManagerImpl] <> INFO uvm[0]: {\"serverIntf\":1,\"timeStamp\":\"2013-02-26 13:44:46.178\",\"SClientAddr\":\"/88.88.8.188\",\"sessionId\":89248860551403,\"tag\":\"uvm[0]: \",\"SServerPort\":80,\"SServerAddr\":\"/222.22.22.122\",\"class\":\"class com.untangle.uvm.node.SessionNatEvent\",\"SClientPort\":18220}" | rex field=tmp "(?<json>\{.+)" | spath input=json

You'll see fields such as SClientAddr, SClientPort, etc.

0 Karma

actech
New Member

Thanks for your quick response, this solution doesn't pick up the data I need though, all this does is return the original syslog entry for me. Which is the problem I'm hitting with the transform.

I'll have a tinker around a bit more to see if I can get a subset of data to work.

0 Karma
Get Updates on the Splunk Community!

Enterprise Security Content Update (ESCU) | New Releases

In December, the Splunk Threat Research Team had 1 release of new security content via the Enterprise Security ...

Why am I not seeing the finding in Splunk Enterprise Security Analyst Queue?

(This is the first of a series of 2 blogs). Splunk Enterprise Security is a fantastic tool that offers robust ...

Index This | What are the 12 Days of Splunk-mas?

December 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...