Hi All,
I have this compressed (reduced version of large structure) which is a combination of basic text and JSON:
2024-07-10 07:27:28 +02:00 LiveEvent: {"data":{"time_span_seconds":300,
"active":17519,
"total":17519,
"unique":4208,
"total_prepared":16684,
"unique_prepared":3703,
"created":594,
"updated":0,
"deleted":0,"ports":[
{"stock_id":49,
"goods_in":0,
"picks":2,
"inspection_or_adhoc":0,
"waste_time":1,
"wait_bin":214,
"wait_user":66,
"stock_open_seconds":281,
"stock_closed_seconds":19,
"bins_above":0,
"completed":[43757746,43756193],
"content_codes":[],
"category_codes":[{"category_code":4,"count":2}]},
{"stock_id":46,
"goods_in":0,
"picks":1,
"inspection_or_adhoc":0,
"waste_time":0,
"wait_bin":2,
"wait_user":298,
"stock_open_seconds":300,
"stock_closed_seconds":0,
"bins_above":0,
"completed":[43769715],
"content_codes":[],
"category_codes":[{"category_code":4,"count":1}]},
{"stock_id":1,
"goods_in":0,
"picks":3,
"inspection_or_adhoc":0,
"waste_time":0,
"wait_bin":191,
"wait_user":40,
"stock_open_seconds":231,
"stock_closed_seconds":69,
"bins_above":0,
"completed":[43823628,43823659,43823660],
"content_codes":[],
"category_codes":[{"category_code":1,"count":3}]}
]},
"uuid":"8711336c-ddcd-432f-b388-8b3940ce151a",
"session_id":"d14fbee3-0a7a-4026-9fbf-d90eb62d0e73",
"session_sequence_number":5113,
"version":"2.0.0",
"installation_id":"a031v00001Bex7fAAB",
"local_installation_timestamp":"2024-07-10T07:35:00.0000000+02:00",
"date":"2024-07-10",
"app_server_timestamp":"2024-07-10T07:27:28.8839856+02:00",
"event_type":"STOCK_AND_PILE"}
I eventually need each “stock_id” ending up as an individual event, and keep the common information along with it like: timestamp, uuid, session_id, session_sequence_number and event_type.
Can someone guide me how to use props and transforms to achieve this?
PS. I have read through several great posts on how to split JSON arrays into events, but none about how to keep common fields in each of them.
Many thanks in advance.
Best Regards,
Bjarne
I'm not sure it can be done reliably using props and transforms. I'd use a scripted input to parse the data and re-format it.
Hi @richgalloway,
Thanks for your input.
Do you happen to have any scripting ideas for this?
I have nothing specific to offer. In a previous job, I used a Python script to parse data and then restructure it so it was easier for Splunk to ingest. It wasn't JSON (I think it was XML), but still should be pretty straightforward.
And btw this one: How to split JSON array into Multiple events at Index Time?
That one relies on the fact that it was a simple array and could be cut with regexes into pieces. The splitting mechanism would break apart if the data changed - for example if there was another field added except the "local" one to the "outer" json.
Hi @PickleRick,
The JSON structure is very solid, and don’t change, except there can be many (+1000) or few (4) “stock_id”.
You talked about scripting inputs as well, do you have any suggestions/examples?
Your case is completely different because you want to keep some of the "outer" information shared between separate events (which actually isn't that good idea because your license usage will get multiplied on those events).
As for the scripted input - see those resources for technicalities from Splunk side. Of course the internals - splitting the event - is entirely up to you.
https://docs.splunk.com/Documentation/Splunk/latest/AdvancedDev/ScriptSetup
https://dev.splunk.com/enterprise/docs/developapps/manageknowledge/custominputs
The thing is, if se don’t split them at index time, the indexers will have even more work to do, as the structures can be huge.
PS. I’m aware of the extra license usage here as well.
Hi @PickleRick,
Thanks for your feedback, though I’m surprised with the answer, as I’ve seen other clear indication and solution to splitting JSON arrays to individual events like: How to parse a JSON array delimited by "," into separate events with their unique timestamps?
1. Please, don't post links butchered by some external "protection" service.
2. You get this wrong 😉 Those articles don't describe splitting json events. They describe breaking input data stream so that it breaks on the "inner" json boundaries instead of the "outer" ones. It doesn't have anything to do with manipulating a single event already being broken from the input stream. It's siimilar to telling Splunk not to break the stream into lines but rather ingest something delimited by whitespaces separately. But your case is completely different because you want to carry over some common part (some common metadata I assume) from the outer json structure to each part extracted from the inner json array. This is way above the simple string-based manipulation that Splunk can do in the ingestion pipeline.
Yes, there are some meta data that need to stay with each event to be able to find them again.
I have some ideas in my head on how to twist this, but right now I’m on vacation, and can’t test them the next weeks time or so, so I’m just “warming up”, and looking for / listening in to others crazy ideas of what they have achieved in Splunk 🙂
It's not about "whose is longer". And yes, I've seen many interesting hacks but the fact remains - Splunk works one event at a time. So you can't "carry over" any info from one event to another using just props and transforms (except for that very very ugly and unmaintainable trick with actually cloning the event and separately modifying each copy). Also you cannot split an event (or merge it) after it's been through the line breaking/merging phase.
So you can't turn
{"whatever": ["a","b","c"], "something":"something"}
into
{"whatever": "a", "something":"something"}
{"whatever": "b", "something":"something"}
{"whatever": "c", "something":"something"}
Using props and transforms alone. Ingestion pipeline doesn't deal with structured data (with the exception of indexed extractions on UF but that's a different story).
Longer than yesterday helps though 🙂
Ok - here are some thoughts I had getting around this, without having a chance to play with it yet.
SEDCMD - looks as a possibility, while knowing it’s not going to be the newbie kind of thing. There is support for back ref, so I thought of coping a core meta field as an addition into each stock_id, and then split the structure to events by each stuck_id
You're thinking in wrong order. That's why I'm saying it's not possible with Splunk alone.
If you don't know this one, it's one of the mainstays of understanding of Splunk indexing process- https://community.splunk.com/t5/Getting-Data-In/Diagrams-of-how-indexing-works-in-the-Splunk-platfor...
As you can see, line breaking is one of the absolute first things happening with the input stream. You can't "backtrack" your way within the ingestion pipeline to do SEDCMD before line breaking.
And, as I wrote already, it's really a very bad idea to tackle structured data with regexes.
TL&DR - you can't split events within Splunk itself during ingestion.
Longer explanation - each event is processed as a single entity. You could try to do a copy of the event using CLONE_SOURCETYPE and then process each of those instances separately (for example - cut some part from one copy but other part from another copy) but it's not something that can be reasonably implemented, it's unmaintaineable in the long run and you can't do it dynamically (like split a json into however many items an array has). Oh, and of course structured data manipulation in ingest time is a relatively big no-no.
So your best bet would be to pre-process your data with a third-party tool. (or at least write a scripted input doing the heavy lifting of splitting the data).