I am trying to break this into multiple events. Each event starts with clouds
. This is JSON format, but Splunk is not recognizing this. Please help me.
{u'list': [{u'clouds': {u'all': 0}, u'name': u'Yafran', u'coord': {u'lat': 32.06329, u'lon': 12.52859}, u'weather': [{u'main': u'Clear', u'id': 800, u'icon':
u'01d', u'description': u'Sky is Clear'}], u'dt': 1446550246, u'main': {u'temp': 25.13, u'grnd_level': 1017.67, u'temp_max': 25.13, u'sea_level': 1037.77,
u'humidity': 58, u'pressure': 1017.67, u'temp_min': 25.13}, u'id': 2208791, u'wind': {u'speed': 2.21, u'deg': 53.5006}}, {u'clouds': {u'all': 0}, u'name':
u'Zuwarah', u'coord': {u'lat': 32.931198, u'lon': 12.08199}, u'weather': [{u'main': u'Clear', u'id': 800, u'icon': u'01d', u'description': u'Sky is Clear'}],
u'dt': 1446550246, u'main': {u'temp': 24, u'grnd_level': 1037.48, u'temp_max': 23.996, u'sea_level': 1038.5, u'humidity': 83, u'pressure': 1037.48, u'temp_min':
23.996}, u'id': 2208425, u'wind': {u'speed': 2.05, u'deg': 68.0012}}, {u'clouds': {u'all': 0}, u'name': u'Sabratah', u'coord': {u'lat': 32.79335, u'lon':
12.48845}, u'weather': [{u'main': u'Clear', u'id': 800, u'icon': u'01d', u'description': u'Sky is Clear'}], u'dt': 1446550247, u'main': {u'temp': 25.13,
u'grnd_level': 1017.67, u'temp_max': 25.13, u'sea_level': 1037.77, u'humidity': 58, u'pressure': 1017.67, u'temp_min': 25.13}, u'id': 2212771, u'wind':
{u'speed': 2.21, u'deg': 53.5006}}, {u'clouds': {u'all': 0}, u'name': u'Gharyan', u'coord': {u'lat': 32.172218, u'lon': 13.02028}, u'weather': [{u'main':
u'Clear', u'id': 800, u'icon': u'01d', u'description': u'Sky is Clear'}], u'dt': 1446550247, u'main': {u'temp': 25.11, u'grnd_level': 1005.43, u'temp_max':
25.105, u'sea_level': 1037.73, u'humidity': 47, u'pressure': 1005.43, u'temp_min': 25.105}, u'id': 2217362, u'wind': {u'speed': 2.21, u'deg': 50.5006}},
The following answer is a great starting point to extracting this kind of single line json (decoded in this case) data.
https://answers.splunk.com/answers/295142/line-breaker-in-single-line-printed-json-doc.html
I used this answer to come up with the following configuration that I used to index the example data provided.
[test_json]
DATETIME_CONFIG = CURRENT
category = Custom
disabled = false
pulldown_type = true
NO_BINARY_CHECK = true
# use number to ensure sed cmd order
SEDCMD-1stripfirsttag = s/^\s*\{u'list':\s\[//
SEDCMD-2encodejson = s/(u'|')/"/g
SEDCMD-3striptrailingvalue = s/(,|]\})\s*$//
LINE_BREAKER = (,)\s\{
SHOULD_LINEMERGE = false
KV_MODE = json
Hope this helps.
And if you want to index the epoch timestamp field (dt) then remove/comment out DATETIME_CONFIG
and add the following lines instead
#DATETIME_CONFIG = CURRENT
MAX_TIMESTAMP_LOOKAHEAD=230
TIME_FORMAT=%s
TIME_PREFIX=dt.:\s
This is not legal JSON. First '
needs to be converted to "
. Second, you have extraneous u
characters before each key, unless that is a paste issue.
If you have control over how the data gets to JSON, you may wish to read this Splunk answer, written by Martin Mueller, as it discusses JSON extractions at some length. If you do not control the process that produces the JSON, you may want to consider pre-processing the data outside of Splunk. Some things are simpler to do with a script than with Splunk.