I'm sure that I'm not the first one running into this issue but I currently cant find a proper solution. Image following JSON data:
[
"Config": {
"Hostname": "XXHost",
"ExposedPorts": {
"1/tcp": {},
"2/tcp": {},
"3/tcp": {},
"4/tcp": {},
"5/tcp": {}
},
}
]
Splunk allows it to query the hostname using myQuery | table Config.Hostname
but my usecase requires to query all exposed ports. When I take a look into the extracted fields at the left side, it seems like splunk skips all dicts without content (even if the childs are empty). Therefore the whole ExposedPorts dict is not available. Further more: until now I have no idea how to query dict names instead of the value.
Sidenote: I used INDEXED_EXTRACTIONS = JSON
because the LINE_BREAKER
option wasn't able to split the json events correctly (they are logged one per line separated by \n). Due to the indexed-time extraction there is no _raw field I can throw a fancy regex at. The data I am talking about is the output of docker inspect XXXX
just in case one is going to reproduce it.
Hi @mertox,
In your python script if you change your line of code to below:
data = {"data" : data}
f.write( json.dumps(data, sort_keys=True)+'\n' )
Try the above code. Hope this helps!!!
Hi @mertox,
It would be easiest solution if you can change anything on source where event is being generated. Make the event event perfect JSON and Splunk will handle rest very easily.
If that is not possible then you would required to write regex to extract all fields.
The data is written by a python script which dumps the restults using
f.write( json.dumps(data, sort_keys=True)+'\n' )
To my understanding this is the best way to do it...