Issue with indexing JSON output from application

raffinyc · ‎11-28-2017

Application running in Openshift 3.6 (kubernetes/Docker), generates raw JSON like this to STDOUT (when running standalone). We're not logging to files right now because we could not get logs from files into splunk (we're not sure where in Openshift apps should log, and that could be our issue)

{"@timestamp":"2017-11-29T01:04:40.938-05:00","@version":1,"message":"Preparing to transfer $1028.","logger_name":"ofi.paas.ocp.logging.splunk.loggers.Slf4TransferService","thread_name":"pool-1-thread-1","level":"INFO","level_value":20000,"transactionOwner":"Susan","transactionId":"9fe28a5b-7ed1-4adc-b852-b4exxxxxxxxx","appname":"my-java-app"}

We can see these log events in splunk, but they're not indexed correctly because none of our custom fields are available for searching, also, log events in splunk appear to be wrapping the raw event generated from our appto have slightly different structure compared to when the app logs standalone (not in Openshift)

{"log", <RAW LOG DATA HERE>}\n","stream":"stdout","time":"<TIMESTAMP>"}

Here's an example of the wrapping:

{"log":"{\"@timestamp\":\"2017-11-29T06:44:58.240+00:00\",\"@version\":1,\"message\":\"Has transfer of $681 completed successfully ? true.\",\"logger_name\":\"ofi.paas.ocp.logging.splunk.loggers.Slf4TransferService\",\"thread_name\":\"pool-1-thread-1\",\"level\":\"INFO\",\"level_value\":20000,\"transactionOwner\":\"John\",\"transactionId\":\"9c2025d7-399d-492c-84f1-3a5xxxxxxxxx\",\"appname\":\"my-java-app\"}\n","stream":"stdout","time":"2017-11-29T06:44:58.240849341Z"}

I suspect some intermediary component between Openshift and splunk is doing this wrapping which might be throwing off the parser, that's my take but I'm new to splunk so not entirely sure.

ssadh_splunk · ‎07-11-2019

Its a Super late response though but ran into a similar issue.
Special thanks to @abhagat_splunk .

Append the below search to the base search to parse out the data

| rex mode=sed "s/"\\\""/"\""/g"
| rex mode=sed "s/"\"{"/{/g"
| rex mode=sed "s/"}\""/}/g"
| spath

mattymo · ‎01-06-2018

***UPDATE* - July 2019 - Use Splunk Connect for Kubernetes!**

It makes easy work of this data and App for Infrastructure will add Openshift support for easy on boarding in an upcoming release!

https://github.com/splunk/splunk-connect-for-kubernetes

https://splunkbase.splunk.com/app/3975/

I have pushed a props.conf to our docker monitoring repo which will unwrap the JSON for you and allows you to support multiline logs and remove any escape chars and encoding from the logs.

https://github.com/splunk/docker-itmonitoring/blob/7.0.0-k8s/README-k8s.md

Would be happy to walk you through it in the slack chat if you like, (splk.it/slack), my handle is @mattymo. Big up Micah and automine for helping me nail the SED commands.

# Experimental props to remove json wrapper, escaped quotes, encoding and other cruft from docker logs. 
#
[kubernetes]
CHARSET=UTF-8
SHOULD_LINEMERGE=false
NO_BINARY_CHECK = true
# remove docker json wrapper, then remove escapes from the quotes in the log message. 
SEDCMD-1_unjsonify = s/{"log":"(?:\\u[0-9]+)?(.*?)\\n","stream.*/\1/g
SEDCMD-2_unescapequotes = s/\\"/"/g
# another exprimental version of the sed.
#SEDCMD-1_unjsonify = s/{"log":"(?:\\u[0-9]+)?(.*)\\n","stream.*?([\n\r])/\1\2/g
category = Custom
disabled = false
pulldown_type = true
TRUNCATE=150000
TZ=UTC

The approach I am taking is to use a base sourcetype called kubernetes to take care of unwrapping the json logs, and then using source based props to apply multiline log support, rename the sourcetype, or other specific parsing per each different log type.

For example, I use this to implement a line breaker for java multiline logs that come from a carts pod:

[source::/var/log/containers/carts-(?!db-)*.log]
#SHOULD_LINEMERGE = true 
SHOULD_LINEMERGE = false 
NO_BINARY_CHECK = true
#BREAK_ONLY_BEFORE = \d{4}\-\d{2}\-\d{2}\s\d{2}\:\d{2}\:\d{2}\.\d{3}
LINE_BREAKER = ([\n\r]+){"log":"[0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}:[0-9]{2}\.[0-9]{3}\s
CHARSET=UTF-8
disabled=false

These two configs in conjunction get you what you would be looking for:

- MattyMo

raffinyc · ‎11-30-2017

We determined the problem on our end to be caused by the Docker daemon (dockerd), which by default uses --log-driver=json-file. Not wanting to mess with such a low level config, we left the driver as-is and instead changed our splunk query:

source= | spath input=log

Simple enough! This performs indexing at search time, not extraction time which was our goal, but it still performs well enough for our needs.

outcoldman · ‎01-05-2018

Hi @raffinyc
We have developed an application for Monitoring OpenShift clusters, including forwarding logs (container, host and openshift components) and monitoring stats (CPU, Memory, IO, etc) on level of processes, containers, pods, hosts. You can get the application from Splunkbase https://splunkbase.splunk.com/app/3836/ and find installation instructions on this page https://www.outcoldsolutions.com/docs/monitoring-openshift/

niketn · ‎11-29-2017

@raffinyc, For the existing wrapped JSON Data being sent to Splunk you might have to extract the inner JSON Data and replace escaped quotes \" with quotes " for spath to work ( during search time ) .

 <YourBaseSearch>
|  rex "{\"log\":\"(?<jsonData>.*)"
|  eval _raw=replace(jsonData,"\\\\\"","\"")
|  spath

Following is a valid JSON which should work straight out of the box if indexed to Splunk:

{"log": {"@timestamp":"2017-11-29T06:44:58.240+00:00", "@version":1, "message":"Has transfer of $681 completed successfully ? true.", "logger_name":"ofi.paas.ocp.logging.splunk.loggers.Slf4TransferService", "thread_name":"pool-1-thread-1", "level":"INFO", "level_value":20000, "transactionOwner":"John", "transactionId":"9c2025d7-399d-492c-84f1-3a51111d967d", "appname":"my-java-app"} ,"stream":"stdout" ,"time":"2017-11-29T06:44:58.240849341Z"}

Following is the run anywhere search with your mocked sample data (PS: I have escaped quotes with backslash to generate mock data in Splunk for demo/test. The same is not required while indexing)

|  makeresults
|  eval _raw="{\"log\":
    {\"@timestamp\":\"2017-11-29T06:44:58.240+00:00\",
    \"@version\":1,
    \"message\":\"Has transfer of $681 completed successfully ? true.\",
    \"logger_name\":\"ofi.paas.ocp.logging.splunk.loggers.Slf4TransferService\",
    \"thread_name\":\"pool-1-thread-1\",
    \"level\":\"INFO\",
    \"level_value\":20000,
    \"transactionOwner\":\"John\",
    \"transactionId\":\"9c2025d7-399d-492c-84f1-3a51111d967d\",
    \"appname\":\"my-java-app\"}
    ,\"stream\":\"stdout\"
    ,\"time\":\"2017-11-29T06:44:58.240849341Z\"}"
|  spath

Please see if wrapped JSON data correction before indexing to Splunk is feasible, if not try out the replace() work around stated above.

____________________________________________
| makeresults | eval message= "Happy Splunking!!!"

raffinyc · ‎11-30-2017

Thanks for the pointers, interesting options, will definitely try some of these out.

We determined the problem on our end to be caused by the Docker daemon (dockerd), which by default uses --log-driver=json-file. Not wanting to mess with such a low level config, we left the driver as-is and instead changed our splunk query:

source= | spath input=log

Simple enough! This performs indexing at search time, not extraction time which was our goal, but it still performs well enough for our needs.

Thanks again!

niketn · ‎11-30-2017

@raffinyc, you should convert your own comment as answer and accept the same to mark this question as answered.

____________________________________________
| makeresults | eval message= "Happy Splunking!!!"

raffinyc · ‎11-30-2017

ok, will do!

raffinyc · ‎11-29-2017

To add, what is the splunk standard prefix for log events? My example above uses this:

{"log":

..but nearly all examples I see use this, does it make any difference?

"message": {

Issue with indexing JSON output from application

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Laser Bananas and Edge Hubs: Exploring Operational Technology (OT) Data Through a ...

Event Series: Mastering AI Tokenomics and Splunk Agent Observability

span_metrics: The OpenTelemetry-Idiomatic Way to See Inside Your Services

Join the Conversation

Issue with indexing JSON output from application

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Laser Bananas and Edge Hubs: Exploring Operational Technology (OT) Data Through a ...

Event Series: Mastering AI Tokenomics and Splunk Agent Observability

span_metrics: The OpenTelemetry-Idiomatic Way to See Inside Your Services