Solved: JSON parsing issue and bad timestamp recognition

tay · ‎08-29-2024

Hello Splunkers,
I have 7 files in JSON format ( the JSON format is the same for each files) , so i applied one parsing for all

* On UF *

[source::/opt/splunk/etc/apps/app_name/result/*.json]
INDEXED_EXTRACTIONS=json
EVENT_BREAKER_ENABLE = true
EVENT_BREAKER = ([\r\n]+)

*On IDX*

[sourcetype_name]
SHOULD_LINEMERGE=false
LINE_BREAKER=([\r\n]+)
NO_BINARY_CHECK=true
CHARSET=UTF-8
TIME_PREFIX=\"timestamp\"\:\s\"
MAX_TIMESTAMP_LOOKAHEAD=19
TIME_FORMAT=%Y-%m-%dT%H:%M:%S
TRUNCATE=999999

*on Search Head*

[sourcetype_name]
KV_MODE=none

Parsing works for all files except one

Here is an excerpt, timestamp with none value

Can you help me on this ?

PickleRick · ‎09-02-2024

Reference material - https://community.splunk.com/t5/Getting-Data-In/Diagrams-of-how-indexing-works-in-the-Splunk-platfor...

Normally (when you're not using indexed extractions), the data is split into chunks, metadata is added _to whole chunks_ and the chunks are sent downstream to HF/indexer for further processing. And first "heavy" (either HF or indexer) component which receives the data does all the heavy lifting and writes data to indexes or sends the parsed data out (and that data is not parsed again - if there are more components in the way parsed data is just forwarded to outputs and that's it).

If you enable indexed extractions your data is parsed into indexed fields (which has its pros but also cons) and gets sent as parsed data which is not parsed again.

(I'm not touching ingest actions topic in here).

So you can either configure timestamp recognition on your UF based on the fields extracted from your json if you want to keep indexed extractions enabled or you can disable indexed extractions and parse json in search time - then you have to let your HF/idx know how to line break and do timestamp recognition. In either case it doesn't hurt to have a full set of settings for the sourcetypes on both layers (UF and HF/idx) - only the ones relevant in specific place are "active".

View solution in original post

PickleRick · ‎08-30-2024

When you use indexed extractions, the events are parsed on the UF and are not touched on subsequent components (with some exceptions which we're not getting into here).

So your props on indexers do not have any effect on parsing.

You're interested in TIMESTAMP_FIELDS (along with TIMESTAMP_FORMAT of course) on the UF.

tay · ‎09-01-2024

Hi PickleRick,

If I understand correctly, I either do all the parsing on the UF, or I remove everything from the UF and move the parsing to the indexer (IDX)?

PickleRick · ‎09-02-2024

Reference material - https://community.splunk.com/t5/Getting-Data-In/Diagrams-of-how-indexing-works-in-the-Splunk-platfor...

Normally (when you're not using indexed extractions), the data is split into chunks, metadata is added _to whole chunks_ and the chunks are sent downstream to HF/indexer for further processing. And first "heavy" (either HF or indexer) component which receives the data does all the heavy lifting and writes data to indexes or sends the parsed data out (and that data is not parsed again - if there are more components in the way parsed data is just forwarded to outputs and that's it).

If you enable indexed extractions your data is parsed into indexed fields (which has its pros but also cons) and gets sent as parsed data which is not parsed again.

(I'm not touching ingest actions topic in here).

So you can either configure timestamp recognition on your UF based on the fields extracted from your json if you want to keep indexed extractions enabled or you can disable indexed extractions and parse json in search time - then you have to let your HF/idx know how to line break and do timestamp recognition. In either case it doesn't hurt to have a full set of settings for the sourcetypes on both layers (UF and HF/idx) - only the ones relevant in specific place are "active".

tay · ‎09-02-2024

Find the solution, host work as an HF. As my data is cooked once so it takes the parsing configuration of this HF, i need to create an HF seperately for this kind of host

tay · ‎09-02-2024

Thank you so much for your anwser. It's pretty clear 😀
I'm going to change my conf now.

PaulPanther · ‎08-29-2024

Please provide the affected event and an event that is parsed correctly.

tay · ‎08-29-2024

event without issue " btoolTag = btool_validate_strptime"

[
  {
    "bad_strptime": "%d.%m.%Y %H:%M:%S,%3",
    "conf_file": "props.conf",
    "stanza": "lb:logs",
    "attribute": "TIME_FORMAT",
    "btoolTag": "btool_validate_strptime",
    "timestamp": "2024-08-29T06:00:04",
    "host": "blabla_hostname"
  },
  {
    "bad_strptime": "%y-%m-%d %H:%M:%S%",
    "conf_file": "props.conf",
    "stanza": "iislogs",
    "attribute": "TIME_FORMAT",
    "btoolTag": "btool_validate_strptime",
    "timestamp": "2024-08-29T06:00:04",
    "host": "blabla_hostname"
  }
]

affected event " btoolTag = btool_validate_regex"

[
  {
    "bad_regex": "(?i)id_618_(?<eventfield_1>\\\\w*).*i_Media=MEDIA_(?<eventfield_2>\\\\w*).*i_Dnbits=(?<eventfield_3\\\\w*).*cs_PERString=(?<eventfield_4>\\\\w*)",
    "conf_file": "props.conf",
    "stanza": "fansfms:aaio",
    "attribute": "EXTRACT-AoIP_message1",
    "reason": "syntax error in subpattern name (missing terminator?)",
    "btoolTag": "btool_validate_regex",
    "timestamp": "2024-08-29T09:47:46",
    "host": "blabla_hostname"
  },
  {
    "bad_regex": "([\\i\\\\fr\\n]+---splunk-admon-end-of-event---\\r\\n[\\r\\n]*)",
    "conf_file": "props.conf",
    "stanza": "source::(....(config|conf|cfg|inii|cfg|emacs|ini|license|lng|plist|presets|properties|props|vim|wsdl))",
    "attribute": "LINE_BREAKER",
    "reason": "unrecognized character follows \\",
    "btoolTag": "btool_validate_regex",
    "timestamp": "2024-08-29T09:47:46",
    "host": "blabla_hostname"
  }
]

dural_yyz24 · ‎08-29-2024

I'm curious about this value.

"reason": "unrecognized character follows \\",

Since the \\ is a literal escape is it reading the remainder of the message as text until the next naturally occurring " on it's own? Can you try changing the "\\" in the text portion of the message to "escape character set".

tay · ‎08-29-2024

Hi,
this error is normal the script catch errors. All values are good. The thing is, when i ingest these logs, and I set TIME_PREFIX, I have 2 values for timestamp just for one log not the others whereas they have the same JSON format ...

JSON parsing issue and bad timestamp recognition

configuration

troubleshooting

Earn a $35 Gift Card for Answering our Splunk Admins & App Developer Survey

Continuing Innovation & New Integrations Unlock Full Stack Observability For Your ...

Monitoring Amazon Elastic Kubernetes Service (EKS)