Splunk cuts off json field in search results

mnikolov2793 · ‎11-01-2021

Hello,

I have been struggling with something that probably is common sense to experts. Part of the Splunk messages that I deal with are mostly structured like the one pasted in the end [1]. The message is persisted full size, however, when it is part of some search result the "object" part which is JSON gets cut off to the following:

{"objectName":"<some_string>"

I know that there is some kind of default limitation that a field cannot exceed 10 000 characters and if it does it could end up like this, however, the problem is also observed for messages that have a total length of 6 000 characters. There must be something else that I currently miss.

I also went through similar questions here that suggested enriching the search queries with a regex that will force the complete field extraction in the search results, like:

| rex object=(?<object>.+)$

This does the job for testing purposes but I would like to find another solution because my searches are executed through the Splunk REST API, it is not an option to hardcode such regexes for multiple fields. I assume that the solution could be accomplished by a configuration on the Splunk side and I would really appreciate it if someone with more experience could take a look.

In addition to the setup at my side, I have one search head and two indexers, the problem is observed no matter if I execute the search through the search head or directly on the indexers.

Thank you in advance.

Best Regards,

Martin

[1] sample message:

formatVersion="<some_version>",
serverTimestamp="<some_timestamp>",
crtAccount="<some_string>",
crtApplication="<some_string>",
crtComponent="<some_string>",
crtTenantId="<some_string>",
crtPermissions="<some_string>",
crtHostname="<some_string>",
accountExt="<some_string>",
clientTimestamp="<some_timestamp>",
messageId="<some_string>",
category="<some_string>",
loggedByClass="<some_string>",
correlation_id="<some_string>",
ip_address="<some_string>",
username="<some_string>",
tenantId="<some_string>",
verb={"action":"update"},
"object="{
   "objectName":"<some_string>",
   "objectAttributs":{
      "System details":{
         "oldValue":"<some_string>",
         "newValue":"<some_string>"
      }
   },
   "auditedObject":{
      "type":"<some_string>",
      "id":{
         "key":"<some_string>"
      }
   }
}

mnikolov2793 · ‎11-03-2021

I tried escaping the double quotes as suggested by @somesoni2 - it did not work. A new finding is that the problem persists only for messages that have this objectAttributs:

      "System details":{
         "oldValue":"<some_string>",
         "newValue":"<some_string>"
      }

Another finding is that not all such messages are cut off. There are some examples with the same structure that are successfully retrieved. For example, a message with the same object part but with randomly generated symbols a-z, 0-9 is not cut off by Splunk in search results. This points me to the fact that the issue is not in the message structure but in the content of the fields oldValue and newValue (where the custom data persist). I checked symbol by symbol and there are no unsupported characters.

It becomes confusing when I recall that adding regex "rex object=(?<object>.+)$" to the search fixes the search result. If it is an issue with the content of the fields, then even the regex wouldn't work.

I am out of ideas, does anyone have a hint about what I miss?

somesoni2 · ‎11-01-2021

I believe it's happening because the double quotes in the content of field "object", due to which the auto field extraction is getting terminated early.

If you've control over the logging, then try to escape double quotes inside the value of "object", like below:

"object="{
   \"objectName\":\"<some_string>\",
   \"objectAttributs\":{
      ....and so on...

You can also save the custom field extraction into props.conf (or save through UI), so it's available to all your queries, including API queries. Do remember to keep those field extraction at global sharing permission.

mnikolov2793 · ‎11-01-2021

Thanks for these suggestions. I will try out escaping the double quotes, however, I think they are not the root cause because some messages with similar "object" parts get extracted correctly. Example:

{
   "objectAttributs":{
      "process-id":"<some_string>",
      "event-type":"<some_string>",
      "request-origin":"<some_string>"
   },
   "auditedObject":{
      "type":"<some_string>",
      "id":{
         "key":<some_string>
      }
   }
}

I will additionally research how to configure custom field extraction into props.conf and get back once I have tried it. If I understand you correctly I can instruct Splunk to always add the specific regex to searches.

Splunk cuts off json field in search results

field extraction

indexer

JSON

Building Reliable Asset and Identity Frameworks in Splunk ES

Cloud Monitoring Console - Unlocking Greater Visibility in SVC Usage Reporting

Automatic Discovery Part 3: Practical Use Cases

Are you a member of the Splunk Community?

Splunk cuts off json field in search results

field extraction

indexer

JSON

Building Reliable Asset and Identity Frameworks in Splunk ES

Cloud Monitoring Console - Unlocking Greater Visibility in SVC Usage Reporting

Automatic Discovery Part 3: Practical Use Cases