Getting Data In

Splunk cuts off json field in search results

mnikolov2793
Observer

Hello,

I have been struggling with something that probably is common sense to experts. Part of the Splunk messages that I deal with are mostly structured like the one pasted in the end [1]. The message is persisted full size, however, when it is part of some search result the "object" part which is JSON gets cut off to the following:

 

{"objectName":"<some_string>"

 

I know that there is some kind of default limitation that a field cannot exceed 10 000 characters and if it does it could end up like this, however, the problem is also observed for messages that have a total length of 6 000 characters. There must be something else that I currently miss.

I also went through similar questions here that suggested enriching the search queries with a regex that will force the complete field extraction in the search results, like: 

 

| rex object=(?<object>.+)$

 

This does the job for testing purposes but I would like to find another solution because my searches are executed through the Splunk REST API, it is not an option to hardcode such regexes for multiple fields. I assume that the solution could be accomplished by a configuration on the Splunk side and I would really appreciate it if someone with more experience could take a look.

In addition to the setup at my side, I have one search head and two indexers, the problem is observed no matter if I execute the search through the search head or directly on the indexers.

Thank you in advance.

Best Regards,

Martin

[1] sample message:

 

formatVersion="<some_version>",
serverTimestamp="<some_timestamp>",
crtAccount="<some_string>",
crtApplication="<some_string>",
crtComponent="<some_string>",
crtTenantId="<some_string>",
crtPermissions="<some_string>",
crtHostname="<some_string>",
accountExt="<some_string>",
clientTimestamp="<some_timestamp>",
messageId="<some_string>",
category="<some_string>",
loggedByClass="<some_string>",
correlation_id="<some_string>",
ip_address="<some_string>",
username="<some_string>",
tenantId="<some_string>",
verb={"action":"update"},
"object="{
   "objectName":"<some_string>",
   "objectAttributs":{
      "System details":{
         "oldValue":"<some_string>",
         "newValue":"<some_string>"
      }
   },
   "auditedObject":{
      "type":"<some_string>",
      "id":{
         "key":"<some_string>"
      }
   }
}

 

 

Labels (3)
0 Karma

mnikolov2793
Observer

I tried escaping the double quotes as suggested by @somesoni2 - it did not work. A new finding is that the problem persists only for messages that have this objectAttributs:

      "System details":{
         "oldValue":"<some_string>",
         "newValue":"<some_string>"
      }

Another finding is that not all such messages are cut off. There are some examples with the same structure that are successfully retrieved. For example, a message with the same object part but with randomly generated symbols a-z, 0-9 is not cut off by Splunk in search results. This points me to the fact that the issue is not in the message structure but in the content of the fields oldValue and newValue (where the custom data persist). I checked symbol by symbol and there are no unsupported characters.

It becomes confusing when I recall that adding regex "rex object=(?<object>.+)$" to the search fixes the search result. If it is an issue with the content of the fields, then even the regex wouldn't work.

I am out of ideas, does anyone have a hint about what I miss?

0 Karma

somesoni2
Revered Legend

I believe it's happening because the double quotes in the content of field "object", due to which the auto field extraction is getting terminated early. 

If you've control over the logging, then try to escape double quotes inside the value of "object", like below:

 

"object="{
   \"objectName\":\"<some_string>\",
   \"objectAttributs\":{
      ....and so on...

You can also save the custom field extraction into props.conf (or save through UI), so it's available to all your queries, including API queries. Do remember to keep those field extraction at global sharing permission.

0 Karma

mnikolov2793
Observer

Thanks for these suggestions. I will try out escaping the double quotes, however, I think they are not the root cause because some messages with similar "object" parts get extracted correctly. Example:

{
   "objectAttributs":{
      "process-id":"<some_string>",
      "event-type":"<some_string>",
      "request-origin":"<some_string>"
   },
   "auditedObject":{
      "type":"<some_string>",
      "id":{
         "key":<some_string>
      }
   }
}

I will additionally research how to configure custom field extraction into props.conf and get back once I have tried it. If I understand you correctly I can instruct Splunk to always add the specific regex to searches. 

Tags (1)
0 Karma
Get Updates on the Splunk Community!

CX Day is Coming!

Customer Experience (CX) Day is on October 7th!! We're so excited to bring back another day full of wonderful ...

Strengthen Your Future: A Look Back at Splunk 10 Innovations and .conf25 Highlights!

The Big One: Splunk 10 is Here!  The moment many of you have been waiting for has arrived! We are thrilled to ...

Now Offering the AI Assistant Usage Dashboard in Cloud Monitoring Console

Today, we’re excited to announce the release of a brand new AI assistant usage dashboard in Cloud Monitoring ...