Splunk Search

Need help extracting timestamp from unstructured data (JSON)

Contributor

Need help to extract timestamp and structure data -

{"time":"2017-12-12 16:25:27.418 +05:30", "severity":"INFORMATION", "_tag":"", "correlation_id":"null","Message":"[RequestController] RequestController - POST - api/request - [STARTS]", "client_id":"null", "instance_id":"null", "class_name_hierrarchy":"","method":"", "response_time":""} {"time":"2017-12-12 16:25:27.418 +05:30", "severity":"INFORMATION", "_tag":"", "correlation_id":"null","Message":"[EPCAdapter] - CreateRequest - Before Get OAuthToken - ", "client_id":"null", "instance_id":"null", "class_name_hierrarchy":"","method":"", "response_time":""} {"time":"2017-12-12 16:25:27.418 +05:30", "severity":"INFORMATION", "_tag":"", "correlation_id":"null","Message":"[HttpCommunicationHelper] In Execute Request Method", "client_id":"null", "instance_id":"null", "class_name_hierrarchy":"","method":"", "response_time":""} {"time":"2017-12-12 16:25:27.418 +05:30", "severity":"INFORMATION", "_tag":"", "correlation_id":"null","Message":"[HttpCommunicationHelper] Request Type is POST", "client_id":"null", "instance_id":"null", "class_name_hierrarchy":"","method":"", "response_time":""} {"time":"2017-12-12 16:25:27.418 +05:30", "severity":"INFORMATION", "_tag":"", "correlation_id":"null","Message":"[HttpCommunicationHelper] Request Base URL is https://int.api.ellielabs.com/", "client_id":"null", "instance_id":"null", "class_name_hierrarchy":"","method":"", "response_time":""}

0 Karma
1 Solution

Champion

This should work:

[<sourcetype>]
SHOULD_LINEMERGE = FALSE
LINE_BREAKER = ([\s\n\r]+){"time":"
TIME_PREFIX = {"time":"
TIME_FORMAT = %Y-%m-%d %H:%M:%S.%3Q %:z

The above assumes that each event starts with {"time", and each instance of {"time" starts an entirely new event.

View solution in original post

0 Karma

Legend

You should input this file on a test machine somewhere using the Add Data wizard. The wizard will show you a preview of how the data will be parsed, and allow you to experiment with various props.conf settings, such as linebreaking and timestamping.

For JSON inputs, you can use INDEXED_EXTRACTIONS if you want, or simply parse the file as plain text.
INDEXED_EXTRACTIONS will consume more disk space and potentially lower the search performance, but it may be your best choice when the data format is variable.

Option 1: no indexed extractions (recommended if it works) props.conf

[yoursourcetypehere]
TRUNCATE = 0
CHARSET = UTF-8
KV_MODE=JSON
SHOULD_LINEMERGE=true
BREAK_ONLY_BEFORE={"time":"
TIME_PREFIX={"time":"

Option 2: indexed extractions props.conf

[yoursourcetypehere]
TRUNCATE = 0
CHARSET = UTF-8
KV_MODE = none
INDEXED_EXTRACTIONS=JSON
TIMESTAMP_FIELDS = time

Finally, you may need to make sure that each new event starts on a new line in the log file.

0 Karma

Champion

This should work:

[<sourcetype>]
SHOULD_LINEMERGE = FALSE
LINE_BREAKER = ([\s\n\r]+){"time":"
TIME_PREFIX = {"time":"
TIME_FORMAT = %Y-%m-%d %H:%M:%S.%3Q %:z

The above assumes that each event starts with {"time", and each instance of {"time" starts an entirely new event.

View solution in original post

0 Karma

Contributor

Sorry for the later reply, while I was out for sometime. But this worked as expected.

Thank you all for help.

0 Karma

SplunkTrust
SplunkTrust

On Indexer/HeavyForwarder, props.conf (event breaking and timestamp parsing configuration)

[YourSourceType]
SHOULD_LINEMERGE = false
LINE_BREAKER =([\r\n]+)(?=\{\"time\")
TIME_PREFIX = ^\{\"time\"\:
TIME_FORMAT = %Y-%m-%d %H:%M:%S.%3N %:z
MAX_TIMESTAMP_LOOKAHEAD = 30

On Search Head, props.conf (field extraction configuration)

[YourSourceType]
KV_MODE = json
0 Karma