Split a nested json array with key/value pairs at ...

loeweps · ‎04-06-2019

I am searching for a way to split an json array at index time with key value pairs.

Raw Data:

{"Source":"192.16.0.1:57913","Telemetry":{"node_id_str":"border_1","subscription_id_str":"101","encoding_path":"Cisco-IOS-XE-memory-oper:memory-statistics/memory-statistic","collection_id":0,"collection_start_time":0,"msg_timestamp":1554539606730,"collection_end_time":0},"Rows":[{"Timestamp":1554539606730,"Keys":{"name":"Processor"},"Content":{"used-memory":325518464}},{"Timestamp":1554539606730,"Keys":{"name":"lsmpi_io"},"Content":{"used-memory":6294304}}]}

I would like to separate this into two events. Keeping or even discarding the header. Though keeping the node_id_str is useful.

Event1:



{


   "Source":"192.16.0.1:57913",

   "node_id_str":"border_1",

   "encoding_path":"Cisco-IOS-XE-memory-oper:memory-statistics/memory-statistic",

   "Timestamp":1554539606730,

   "name":"lsmpi_io"

   "used-memory":6294304

}

Event2:



{


   "Source":"192.16.0.1:57913",

   "node_id_str":"border_1",

   "encoding_path":"Cisco-IOS-XE-memory-oper:memory-statistics/memory-statistic",

   "Timestamp":1554539606730,

   "name":"Processor"

   "used-memory":325518464

}

Or alternatively if there isn't an easy way. Just keep the data in Rows{}
Event1:



{


   "Timestamp":1554539606730,

   "name":"lsmpi_io"

   "used-memory":6294304

}

I am using indexed_extractions = json. I have tried working with line breaker, must_break_after, and sedcmd removing the header but I haven't had much luck.

woodcock · ‎04-07-2019

The problem is probably less with your settings and more with your methodology. For linebreaking and timestamping, the props.conf must be deployed to the first full instance of splunk that handles the data. Many people do not understand that for any OS, there are 2 different types of Splunk: the Universal Forwarder ( UF ) and full Splunk Enterprise (typically designated as either Heavy Forwarder or Indexer). If all you are doing is forwarding events, be sure to use the UF for this. So now that you are forwarding your events with a UF, that is not the place for these settings (because it is not a full instance of Splunk); you need to deploy your props.conf to your HF (which hopefully you are not using, because there really is no good reason to) or Indexer tier. Next you need to restart all Splunk instance there. When I say "all Splunk intances" be aware that sometimes there is more than one instance of Splunk installed on a node, so be sure to restart them all. Next, be sure that you are validating your settings against events that have been forwarded in after the restarts. You may have been timestamping events wrong and throwing them into the future. If so (and you may not even be aware that this is so), a simple Last 5 minutes on the Timepicker maybe showing you events that were forwarded and indexed (and incorrectly timestamped), long before the restarts. So be sure to always use a value of All time in your timepicker along with adding _index_earliest = -5m (or similar) to your search SPL. This way, even if you are still timestamping incorrectly (either into the future or into the past), you will see see recently-indexed events to use to validate the other settings.

harsmarvania57 · ‎04-06-2019

Hi,

Based on my knowledge, you can't separate those in two events but you can try below config to achieve last event which you mentioned in your question.

props.conf

[ yoursourcetype]
SEDCMD-test=s/\"Keys"\:\{([^\}]*)\}(\,)\"Content\"\:\{([^\}]*)\}/\1\2\3/g

Data will look like as given in below screenshot

loeweps · ‎04-07-2019

Thanks for the reply harsmarvania57. That answers part of the question but the overall splitting of events should be possible. Here are a couple links where people were successful. I have tried to replicate what they did but I haven't had much luck as the line-breaker appears as if it is getting ignored.

https://answers.splunk.com/answers/704459/how-to-split-json-array-into-multiple-events-using.html

https://answers.splunk.com/answers/289520/how-to-split-a-json-array-into-multiple-events-wit.html

This is my props.conf trying to use Line_breaker to split the events. Once I get that working I would then use part of what you provided to clean up the Keys/Content fields. I have tested this in regex and the matches appear to work.



[gpb_kv_test7]

DATETIME_CONFIG = 

INDEXED_EXTRACTIONS = json

LINE_BREAKER = ((?!"),(?!")|[\r\n]+)

NO_BINARY_CHECK = true

SEDCMD-remove_prefix = s/({\"Source\"\S+?\"Rows\":[)//g

SEDCMD-remove_suffix = s/]}//g

SHOULD_LINEMERGE = false

TIMESTAMP_FIELDS = Rows{}.Timestamp

category = Structured

description = JavaScript Object Notation format. For more information, visit http://json.org/

disabled = false

pulldown_type = true

Leaves me with a single event like this:



{"Timestamp":1554629642020,"Keys":{"name":"Processor"},"Content":{"free-memory":1123798432}},{"Timestamp":1554629642020,"Keys":{"name":"lsmpi_io"},"Content":{"free-memory":824}}

loeweps · ‎04-09-2019

Thanks again harsmarvania57. Removing indexed_extractions and replacing it with kv_mode=json worked to split the events when I added them via json file. If I use the HEC to stream events to the Splunk instance it would still ignore the line-breaker but not the rest of the commands. I ended up resolving the issue by splitting the events at the source(kafka). I found an option within Kafka connect to send the events to the raw HEC endpoint instead of the HEC event endpoint but I didn't get to test that option as the issue was already solved.

harsmarvania57 · ‎04-09-2019

It's good that issue is resolved, you can accept my answer if it really helped. Regarding HEC endpoints, when you send data to HEC event endpoint it will skip few pipelines on Indexer and due to that above props config will not work, in this case you need to use HEC raw eventpoint which you already figured out.

harsmarvania57 · ‎04-07-2019

In that case use below configuration in props.conf, do not use INDEXED_EXTRACTIONS = json because if you will use INDEXED_EXTRACTIONS = json then it will skip certain queues while parsing the data and due to that other settings will not work.

[yoursourcetype]
LINE_BREAKER = (\[|\,)\{\"Timestamp
SEDCMD-a = s/\{\"Source\"\:[^\}]*\}\,\"Rows\"\://
SEDCMD-b = s/\]\}//
SEDCMD-c = s/\"Keys"\:\{([^\}]*)\}(\,)\"Content\"\:\{([^\}]*)\}/\1\2\3/g
SHOULD_LINEMERGE = false
TIME_PREFIX = \"Timestamp\"\:
disabled = false

I have used below dummy data to extract relevant event

{"Source":"192.16.0.1:57913","Telemetry":{"node_id_str":"border_1","subscription_id_str":"101","encoding_path":"Cisco-IOS-XE-memory-oper:memory-statistics/memory-statistic","collection_id":0,"collection_start_time":0,"msg_timestamp":1554539606730,"collection_end_time":0},"Rows":[{"Timestamp":1554539606730,"Keys":{"name":"Processor"},"Content":{"used-memory":325518464}},{"Timestamp":1554539606730,"Keys":{"name":"lsmpi_io"},"Content":{"used-memory":6294304}}]}
{"Source":"192.16.0.12:57913","Telemetry":{"node_id_str":"border_1","subscription_id_str":"101","encoding_path":"Cisco-IOS-XE-memory-oper:memory-statistics/memory-statistic","collection_id":0,"collection_start_time":0,"msg_timestamp":1553539606730,"collection_end_time":0},"Rows":[{"Timestamp":1553539606730,"Keys":{"name":"Processor"},"Content":{"used-memory":64744628}},{"Timestamp":1553539606730,"Keys":{"name":"lsmpi_io"},"Content":{"used-memory":53656}}]}

Split a nested json array with key/value pairs at index time

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Value Insights: Now Generally Available in the CMC

What’s New in Splunk AI: Volume 02

Splunk App Dev Quarterly Roundup: AI, Agents, and Innovation!

Join the Conversation