Getting Data In

Need help parsing and extracting MongoDB JSON log in Splunk

ychoo
Observer

Hi,

I need some help.

We have been using Splunk for MongoDB alert for a while, now the new MongoDB version we are upgrading to is changing the log format from text to JSON.

I need to alter the alert in Splunk so that it will continue to work with the new JSON log format.

Here is an example of a search query in one of the alert we have now:

index=googlecloud*
source="projects/dir1/dir2/mongodblogs"
data.logName="projects/dir3/logs/mongodb"
data.textPayload="* REPL *"
NOT "catchup takeover"
| rex field=data.textPayload "(?<sourceTimestamp>\d{4}-\d*-\d*T\d*:\d*:\d*.\d*)-\d*\s*(?<severity>\w*)\s*(?<component>\w*)\s*(?<context>\S*)\s*(?<message>.*)"
| search component="REPL"
message!="*took *ms"
message!="warning: log line attempted * over max size*"
NOT (severity="I" AND message="applied op: CRUD*" AND message!="*took *ms")
| rename data.labels.compute.googleapis.com/resource_name as server
| regex server="^preprod0[12]-.+-mongodb-server8*\d$"
| sort sourceTimestamp data.insertId
| table sourceTimestamp server severity component context message

 

The content of the MongoDB log is under data.TextPayload, currently is being formatted using regex and split into 5 groups with labels and then we search from each group for the string or message that we want to be alerted on.

The new JSON format log looks like this:

{"t":{"$date":"2022-04-19T07:50:31.005-04:00"},"s":"I", "c":"REPL", "id":21340, "ctx":"RstlKillOpThread","msg":"State transition ops metrics","attr":{"metrics":{"lastStateTransition":"stepDown","userOpsKilled":0,"userOpsRunning":4}}}

I need to split them into 7 groups, using comma as delimiter and then search from each group using the same search criteria.

I have been trying and testing for 2 days, I'm new to Splunk and not very good in regex.

Any help would be appreciated.

Thanks !

Sally

Labels (1)
0 Karma

ychoo
Observer

I got this one working now with below:

sourcetype=json
| spath input="data.textPayload" output="Timestamp" path=t
| spath input="data.textPayload" output="Severity" path=s
| spath input="data.textPayload" output="Component" path=c
| spath input="data.textPayload" output="Context" path=ctx
| spath input="data.textPayload" output="Message" path=msg
| spath input="data.textPayload" output="Attr" path=attr

| search < Search criteria by field here> 

| table Timestamp Server Severity Component Context Message Attr

 

Thanks !

0 Karma

Gr0und_Z3r0
Contributor

Hi @ychoo 
I think you'll need to update the sourcetype for the log from text to json to correctly parse the logs looking at a good long term solution.
If not you can proceed with using spath or regex for field extractions.

Gr0und_Z3r0_0-1650453605845.png

 

source="D:\\Learn Splunk\\test.txt"  sourcetype="text"
| rex field=_raw "date\"\:\"(?P<SourceTimestamp>[\d\-\:\.T]+)\"\},\"s\"\:\"(?P<Severity>[\w]+)\",\s\"c\"\:\"(?P<Component>[\w]+)"
| rex field=_raw "ctx\"\:\"(?P<Context>[\w]+)\",\"msg\"\:\"(?P<Message>[\w\s]+)"
| table _time SourceTimestamp Severity Component Context Message

 

 

Gr0und_Z3r0_2-1650454407007.png

 

 

0 Karma
Get Updates on the Splunk Community!

Index This | What is broken 80% of the time by February?

December 2025 Edition   Hayyy Splunk Education Enthusiasts and the Eternally Curious!    We’re back with this ...

Unlock Faster Time-to-Value on Edge and Ingest Processor with New SPL2 Pipeline ...

Hello Splunk Community,   We're thrilled to share an exciting update that will help you manage your data more ...

Splunk MCP & Agentic AI: Machine Data Without Limits

Discover how the Splunk Model Context Protocol (MCP) Server can revolutionize the way your organization uses ...