Getting Data In

Split a nested json array with key/value pairs at index time

loeweps
Explorer

I am searching for a way to split an json array at index time with key value pairs.

Raw Data:

{"Source":"192.16.0.1:57913","Telemetry":{"node_id_str":"border_1","subscription_id_str":"101","encoding_path":"Cisco-IOS-XE-memory-oper:memory-statistics/memory-statistic","collection_id":0,"collection_start_time":0,"msg_timestamp":1554539606730,"collection_end_time":0},"Rows":[{"Timestamp":1554539606730,"Keys":{"name":"Processor"},"Content":{"used-memory":325518464}},{"Timestamp":1554539606730,"Keys":{"name":"lsmpi_io"},"Content":{"used-memory":6294304}}]}

I would like to separate this into two events. Keeping or even discarding the header. Though keeping the node_id_str is useful.

Event1:


{

"Source":"192.16.0.1:57913",
"node_id_str":"border_1",
"encoding_path":"Cisco-IOS-XE-memory-oper:memory-statistics/memory-statistic",
"Timestamp":1554539606730,
"name":"lsmpi_io"
"used-memory":6294304
}

Event2:

{

"Source":"192.16.0.1:57913",
"node_id_str":"border_1",
"encoding_path":"Cisco-IOS-XE-memory-oper:memory-statistics/memory-statistic",
"Timestamp":1554539606730,
"name":"Processor"
"used-memory":325518464
}

Or alternatively if there isn't an easy way. Just keep the data in Rows{}
Event1:


{

"Timestamp":1554539606730,
"name":"lsmpi_io"
"used-memory":6294304
}

I am using indexed_extractions = json. I have tried working with line breaker, must_break_after, and sedcmd removing the header but I haven't had much luck.

0 Karma

woodcock
Esteemed Legend

The problem is probably less with your settings and more with your methodology. For linebreaking and timestamping, the props.conf must be deployed to the first full instance of splunk that handles the data. Many people do not understand that for any OS, there are 2 different types of Splunk: the Universal Forwarder ( UF ) and full Splunk Enterprise (typically designated as either Heavy Forwarder or Indexer). If all you are doing is forwarding events, be sure to use the UF for this. So now that you are forwarding your events with a UF, that is not the place for these settings (because it is not a full instance of Splunk); you need to deploy your props.conf to your HF (which hopefully you are not using, because there really is no good reason to) or Indexer tier. Next you need to restart all Splunk instance there. When I say "all Splunk intances" be aware that sometimes there is more than one instance of Splunk installed on a node, so be sure to restart them all. Next, be sure that you are validating your settings against events that have been forwarded in after the restarts. You may have been timestamping events wrong and throwing them into the future. If so (and you may not even be aware that this is so), a simple Last 5 minutes on the Timepicker maybe showing you events that were forwarded and indexed (and incorrectly timestamped), long before the restarts. So be sure to always use a value of All time in your timepicker along with adding _index_earliest = -5m (or similar) to your search SPL. This way, even if you are still timestamping incorrectly (either into the future or into the past), you will see see recently-indexed events to use to validate the other settings.

0 Karma

harsmarvania57
SplunkTrust
SplunkTrust

Hi,

Based on my knowledge, you can't separate those in two events but you can try below config to achieve last event which you mentioned in your question.

props.conf

[ yoursourcetype]
SEDCMD-test=s/\"Keys"\:\{([^\}]*)\}(\,)\"Content\"\:\{([^\}]*)\}/\1\2\3/g

Data will look like as given in below screenshot

alt text

0 Karma

loeweps
Explorer

Thanks for the reply harsmarvania57. That answers part of the question but the overall splitting of events should be possible. Here are a couple links where people were successful. I have tried to replicate what they did but I haven't had much luck as the line-breaker appears as if it is getting ignored.

https://answers.splunk.com/answers/704459/how-to-split-json-array-into-multiple-events-using.html

https://answers.splunk.com/answers/289520/how-to-split-a-json-array-into-multiple-events-wit.html

This is my props.conf trying to use Line_breaker to split the events. Once I get that working I would then use part of what you provided to clean up the Keys/Content fields. I have tested this in regex and the matches appear to work.


[gpb_kv_test7]
DATETIME_CONFIG =
INDEXED_EXTRACTIONS = json
LINE_BREAKER = ((?!"),(?!")|[\r\n]+)
NO_BINARY_CHECK = true
SEDCMD-remove_prefix = s/({\"Source\"\S+?\"Rows\":[)//g
SEDCMD-remove_suffix = s/]}//g
SHOULD_LINEMERGE = false
TIMESTAMP_FIELDS = Rows{}.Timestamp
category = Structured
description = JavaScript Object Notation format. For more information, visit http://json.org/
disabled = false
pulldown_type = true

Leaves me with a single event like this:


{"Timestamp":1554629642020,"Keys":{"name":"Processor"},"Content":{"free-memory":1123798432}},{"Timestamp":1554629642020,"Keys":{"name":"lsmpi_io"},"Content":{"free-memory":824}}

0 Karma

loeweps
Explorer

Thanks again harsmarvania57. Removing indexed_extractions and replacing it with kv_mode=json worked to split the events when I added them via json file. If I use the HEC to stream events to the Splunk instance it would still ignore the line-breaker but not the rest of the commands. I ended up resolving the issue by splitting the events at the source(kafka). I found an option within Kafka connect to send the events to the raw HEC endpoint instead of the HEC event endpoint but I didn't get to test that option as the issue was already solved.

0 Karma

harsmarvania57
SplunkTrust
SplunkTrust

It's good that issue is resolved, you can accept my answer if it really helped. Regarding HEC endpoints, when you send data to HEC event endpoint it will skip few pipelines on Indexer and due to that above props config will not work, in this case you need to use HEC raw eventpoint which you already figured out.

0 Karma

harsmarvania57
SplunkTrust
SplunkTrust

In that case use below configuration in props.conf, do not use INDEXED_EXTRACTIONS = json because if you will use INDEXED_EXTRACTIONS = json then it will skip certain queues while parsing the data and due to that other settings will not work.

[yoursourcetype]
LINE_BREAKER = (\[|\,)\{\"Timestamp
SEDCMD-a = s/\{\"Source\"\:[^\}]*\}\,\"Rows\"\://
SEDCMD-b = s/\]\}//
SEDCMD-c = s/\"Keys"\:\{([^\}]*)\}(\,)\"Content\"\:\{([^\}]*)\}/\1\2\3/g
SHOULD_LINEMERGE = false
TIME_PREFIX = \"Timestamp\"\:
disabled = false

I have used below dummy data to extract relevant event

{"Source":"192.16.0.1:57913","Telemetry":{"node_id_str":"border_1","subscription_id_str":"101","encoding_path":"Cisco-IOS-XE-memory-oper:memory-statistics/memory-statistic","collection_id":0,"collection_start_time":0,"msg_timestamp":1554539606730,"collection_end_time":0},"Rows":[{"Timestamp":1554539606730,"Keys":{"name":"Processor"},"Content":{"used-memory":325518464}},{"Timestamp":1554539606730,"Keys":{"name":"lsmpi_io"},"Content":{"used-memory":6294304}}]}
{"Source":"192.16.0.12:57913","Telemetry":{"node_id_str":"border_1","subscription_id_str":"101","encoding_path":"Cisco-IOS-XE-memory-oper:memory-statistics/memory-statistic","collection_id":0,"collection_start_time":0,"msg_timestamp":1553539606730,"collection_end_time":0},"Rows":[{"Timestamp":1553539606730,"Keys":{"name":"Processor"},"Content":{"used-memory":64744628}},{"Timestamp":1553539606730,"Keys":{"name":"lsmpi_io"},"Content":{"used-memory":53656}}]}
.conf21 CFS Extended through 5/20!

Don't miss your chance
to share your Splunk
wisdom in-person or
virtually at .conf21!

Call for Speakers has
been extended through
Thursday, 5/20!