The log line is about 9999 characters long with spaces, and not all the log line is ingested - I think i need to create a limits.conf file? Absolutely. Good data is the only guarantee that any...
See more...
The log line is about 9999 characters long with spaces, and not all the log line is ingested - I think i need to create a limits.conf file? Absolutely. Good data is the only guarantee that any work on it will be valid. This said, Splunk's KV extraction does not look beyond the first occurrence of key. (And that's a good thing. It is a risky proposition for any language to assume the intention of multiple occurrences of a left-hand side value.) The main problem is caused by the developers, who take pains to invent a structured data that is not standard. It seems that they use foo[] to indicate an array (events), then use bar() to indicate an element; inside element, they use = to separate key and value. Then, on top of this, they use geez() to signal a top level structure ("VUpdate") with key-value pairs that includes the events[] array. If you have any influence over developers, you should urge them, beg them, implore them to use a standard structured representation such as JSON. If not, you can use Splunk to try to parse out the structure. But this is going to be messy and will never be robust. Unless your developers swear on their descendants' descendants (and their ancestors' ancestors) not to change format, you future can be ruined at their whim. Before I delve into SPL, I also want to clarify this: Splunk already give you the following fields: channelCode, contentType, duration, eventNumber, eventTitle, events, onAir, system, type, and utcStartDateTime. Is this correct? While you can ignore any second level fields such as eventTitle and eventNumber, I also want to confirm that events includes the whole thing from [ all the way to ]. Is this correct? I'll suggest two approaches, both rely on the structure I reverse engineered above. The first one is straight string manipulation, and uses Splunk's split function to isolate individual events. | fields system channelCode channelCode type events
| eval events = split(events, "),")
| mvexpand events
| rename events AS _raw
| rex mode=sed "s/^[\[\s]*Event\(// s/[\)\]]//g"
| kv kvdelim="=" pairdelim="," The second one tries to "translate" your developers's log structure into JSON using string manipulation. | rex field=events mode=sed "s/\(/\": {/g s/ *\)/}}/g s/=\s+/=/g s/\s+,/,/g s/(\w+)=([^,}]+)/\"\1\": \"\2\"/g s/\"(true|false)\"/\1/g s/Event/{\"Event/g"
| spath input=events path={}
| fields - events
| mvexpand {}
| spath input={}
| fields - {}
| rename Event.* As * The second approach is not more robust; if anything, it is less. But it better illustrates the perceived structure. Either way, your sample data should give you something like channelCode contentType duration eventNumber eventTitle onAir system type utcStartDateTime UH Prog 00:00:05.000 725538339 BooRadley true GRP1-VIPE NextEvents 2023-11-17T15:42:10.160Z UH Bumper 00:00:02.000 725538313 REGGAE-2 false GRP1-VIPE NextEvents 2023-11-17T15:42:15.160Z UH Commercial 00:01:30.000 725538320 CHRISITAN MISSION false GRP1-VIPE NextEvents 2023-11-17T15:42:17.160Z This is an emulation you can play with and compare with real data | makeresults
| eval _raw = "20231117154211 [18080-exec-9] INFO EventConversionService () - SArts: VUpdate(system=GRP1-VIPE, channelCode=UH, type=NextEvents, events=[Event(onAir=true, eventNumber=725538339, utcStartDateTime=2023-11-17T15:42:10.160Z, duration=00:00:05.000, eventTitle=BooRadley, contentType=Prog ), Event(onAir=false, eventNumber=725538313, utcStartDateTime=2023-11-17T15:42:15.160Z, duration=00:00:02.000, eventTitle= REGGAE-2, contentType=Bumper), Event(onAir=false, eventNumber=725538320, utcStartDateTime=2023-11-17T15:42:17.160Z, duration=00:01:30.000, eventTitle=CHRISITAN MISSION , contentType=Commercial)])"
| extract
``` data emulation above ``` Hope this helps.