Splunk Enterprise

JSON parsing issue and bad timestamp recognition

tay
Explorer

Hello Splunkers, 
I have 7 files in JSON format ( the JSON format is the same for each files) , so i applied one parsing for all


* On UF *

 

 

[source::/opt/splunk/etc/apps/app_name/result/*.json]
INDEXED_EXTRACTIONS=json
EVENT_BREAKER_ENABLE = true
EVENT_BREAKER = ([\r\n]+)

 

 

*On IDX*

 

 

[sourcetype_name]
SHOULD_LINEMERGE=false
LINE_BREAKER=([\r\n]+)
NO_BINARY_CHECK=true
CHARSET=UTF-8
TIME_PREFIX=\"timestamp\"\:\s\"
MAX_TIMESTAMP_LOOKAHEAD=19
TIME_FORMAT=%Y-%m-%dT%H:%M:%S
TRUNCATE=999999

 

 

*on Search Head*

 

 

[sourcetype_name]
KV_MODE=none

 

 

 

Parsing works for all files except one

Here is an excerpt, timestamp with none value

tay_0-1724925413410.png

Can you help me on this ? 

 

Labels (2)
0 Karma
1 Solution

PickleRick
SplunkTrust
SplunkTrust

Reference material - https://community.splunk.com/t5/Getting-Data-In/Diagrams-of-how-indexing-works-in-the-Splunk-platfor...

Normally (when you're not using indexed extractions), the data is split into chunks, metadata is added _to whole chunks_ and the chunks are sent downstream to HF/indexer for further processing. And first "heavy" (either HF or indexer) component which receives the data does all the heavy lifting and writes data to indexes or sends the parsed data out (and that data is not parsed again - if there are more components in the way parsed data is just forwarded to outputs and that's it).

If you enable indexed extractions your data is parsed into indexed fields (which has its pros but also cons) and gets sent as parsed data which is not parsed again.

(I'm not touching ingest actions topic in here).

So you can either configure timestamp recognition on your UF based on the fields extracted from your json if you want to keep indexed extractions enabled or you can disable indexed extractions and parse json in search time - then you have to let your HF/idx know how to line break and do timestamp recognition. In either case it doesn't hurt to have a full set of settings for the sourcetypes on both layers (UF and HF/idx) - only the ones relevant in specific place are "active".

View solution in original post

0 Karma

PickleRick
SplunkTrust
SplunkTrust

When you use indexed extractions, the events are parsed on the UF and are not touched on subsequent components (with some exceptions which we're not getting into here).

So your props on indexers do not have any effect on parsing.

You're interested in TIMESTAMP_FIELDS (along with TIMESTAMP_FORMAT of course) on the UF.

0 Karma

tay
Explorer

Hi PickleRick, 

If I understand correctly, I either do all the parsing on the UF, or I remove everything from the UF and move the parsing to the indexer (IDX)?

0 Karma

PickleRick
SplunkTrust
SplunkTrust

Reference material - https://community.splunk.com/t5/Getting-Data-In/Diagrams-of-how-indexing-works-in-the-Splunk-platfor...

Normally (when you're not using indexed extractions), the data is split into chunks, metadata is added _to whole chunks_ and the chunks are sent downstream to HF/indexer for further processing. And first "heavy" (either HF or indexer) component which receives the data does all the heavy lifting and writes data to indexes or sends the parsed data out (and that data is not parsed again - if there are more components in the way parsed data is just forwarded to outputs and that's it).

If you enable indexed extractions your data is parsed into indexed fields (which has its pros but also cons) and gets sent as parsed data which is not parsed again.

(I'm not touching ingest actions topic in here).

So you can either configure timestamp recognition on your UF based on the fields extracted from your json if you want to keep indexed extractions enabled or you can disable indexed extractions and parse json in search time - then you have to let your HF/idx know how to line break and do timestamp recognition. In either case it doesn't hurt to have a full set of settings for the sourcetypes on both layers (UF and HF/idx) - only the ones relevant in specific place are "active".

0 Karma

tay
Explorer

Find the solution, host work as an HF. As my data is cooked once so it takes the parsing configuration of this HF, i need to create an HF seperately for this kind of host

 

0 Karma

tay
Explorer

Thank you so much for your anwser.  It's pretty clear 😀
I'm going to change my conf now. 

0 Karma

PaulPanther
Motivator

Please provide the affected event and an event that is parsed correctly.

0 Karma

tay
Explorer

event without issue " btoolTag = btool_validate_strptime"

[
  {
    "bad_strptime": "%d.%m.%Y %H:%M:%S,%3",
    "conf_file": "props.conf",
    "stanza": "lb:logs",
    "attribute": "TIME_FORMAT",
    "btoolTag": "btool_validate_strptime",
    "timestamp": "2024-08-29T06:00:04",
    "host": "blabla_hostname"
  },
  {
    "bad_strptime": "%y-%m-%d %H:%M:%S%",
    "conf_file": "props.conf",
    "stanza": "iislogs",
    "attribute": "TIME_FORMAT",
    "btoolTag": "btool_validate_strptime",
    "timestamp": "2024-08-29T06:00:04",
    "host": "blabla_hostname"
  }
]

affected event " btoolTag = btool_validate_regex"

[
  {
    "bad_regex": "(?i)id_618_(?<eventfield_1>\\\\w*).*i_Media=MEDIA_(?<eventfield_2>\\\\w*).*i_Dnbits=(?<eventfield_3\\\\w*).*cs_PERString=(?<eventfield_4>\\\\w*)",
    "conf_file": "props.conf",
    "stanza": "fansfms:aaio",
    "attribute": "EXTRACT-AoIP_message1",
    "reason": "syntax error in subpattern name (missing terminator?)",
    "btoolTag": "btool_validate_regex",
    "timestamp": "2024-08-29T09:47:46",
    "host": "blabla_hostname"
  },
  {
    "bad_regex": "([\\i\\\\fr\\n]+---splunk-admon-end-of-event---\\r\\n[\\r\\n]*)",
    "conf_file": "props.conf",
    "stanza": "source::(....(config|conf|cfg|inii|cfg|emacs|ini|license|lng|plist|presets|properties|props|vim|wsdl))",
    "attribute": "LINE_BREAKER",
    "reason": "unrecognized character follows \\",
    "btoolTag": "btool_validate_regex",
    "timestamp": "2024-08-29T09:47:46",
    "host": "blabla_hostname"
  }
]

 

0 Karma

dural_yyz24
Explorer

I'm curious about this value.

"reason": "unrecognized character follows \\",

Since the \\ is a literal escape is it reading the remainder of the message as text until the next naturally occurring " on it's own?  Can you try changing the "\\" in the text portion of the message to "escape character set".

 

0 Karma

tay
Explorer

Hi, 
this error is normal the script catch errors. All values are good. The thing is, when i ingest these logs, and I set TIME_PREFIX, I have 2 values for timestamp just for one log not the others whereas they have the same JSON format ... 

0 Karma
Get Updates on the Splunk Community!

Splunk Enterprise Security 8.0.2 Availability: On cloud and On-premise!

A few months ago, we released Splunk Enterprise Security 8.0 for our cloud customers. Today, we are excited to ...

Logs to Metrics

Logs and Metrics Logs are generally unstructured text or structured events emitted by applications and written ...

Developer Spotlight with Paul Stout

Welcome to our very first developer spotlight release series where we'll feature some awesome Splunk ...