Hi,
I have two problems with a log line.
1)
I have a log line that occasionally is inserted.
It is a schedule, and i wish to extract the data from it. The entry has values that are
eventTitle=
However, Splunk is only pulling the first occurrence from the log line and ignoring the rest.
so i get;
eventTitle=BooRadley
in my fields, instead of
eventTitle=BooRadley
eventTitle=REGGAE-2
eventTitle=CHRISTIAN MISSION
I have tried using regex and | kv pairdelim="=", kvdelim=","
I am unsure if a line break would work as they are referenced to SArts - This is a field extracted via regex and changes.
2)
The log line is about 9999 characters long with spaces, and not all the log line is ingested - I think i need to create a limits.conf file?
Below is an abridged extract of the log line
20231117154211 [18080-exec-9] INFO EventConversionService () - SArts: VUpdate(system=GRP1-VIPE, channelCode=UH, type=NextEvents, events=[Event(onAir=true, eventNumber=725538339, utcStartDateTime=2023-11-17T15:42:10.160Z, duration=00:00:05.000, eventTitle=BooRadley, contentType=Prog ), Event(onAir=false, eventNumber=725538313, utcStartDateTime=2023-11-17T15:42:15.160Z, duration=00:00:02.000, eventTitle= REGGAE-2, contentType=Bumper), Event(onAir=false, eventNumber=725538320, utcStartDateTime=2023-11-17T15:42:17.160Z, duration=00:01:30.000, eventTitle=CHRISITAN MISSION , contentType=Commercial), Event…
This is my code so far;
| rex "\-\s+(?<channel_name>.+)\:\sVUpdate" | stats values(eventNumber) by channel_name channelCode utcStartDateTime eventTitle duration
The log line is about 9999 characters long with spaces, and not all the log line is ingested - I think i need to create a limits.conf file?
Absolutely. Good data is the only guarantee that any work on it will be valid.
This said, Splunk's KV extraction does not look beyond the first occurrence of key. (And that's a good thing. It is a risky proposition for any language to assume the intention of multiple occurrences of a left-hand side value.) The main problem is caused by the developers, who take pains to invent a structured data that is not standard. It seems that they use foo[] to indicate an array (events), then use bar() to indicate an element; inside element, they use = to separate key and value. Then, on top of this, they use geez() to signal a top level structure ("VUpdate") with key-value pairs that includes the events[] array. If you have any influence over developers, you should urge them, beg them, implore them to use a standard structured representation such as JSON.
If not, you can use Splunk to try to parse out the structure. But this is going to be messy and will never be robust. Unless your developers swear on their descendants' descendants (and their ancestors' ancestors) not to change format, you future can be ruined at their whim.
Before I delve into SPL, I also want to clarify this: Splunk already give you the following fields: channelCode, contentType, duration, eventNumber, eventTitle, events, onAir, system, type, and utcStartDateTime. Is this correct? While you can ignore any second level fields such as eventTitle and eventNumber, I also want to confirm that events includes the whole thing from [ all the way to ]. Is this correct?
I'll suggest two approaches, both rely on the structure I reverse engineered above. The first one is straight string manipulation, and uses Splunk's split function to isolate individual events.
| fields system channelCode channelCode type events
| eval events = split(events, "),")
| mvexpand events
| rename events AS _raw
| rex mode=sed "s/^[\[\s]*Event\(// s/[\)\]]//g"
| kv kvdelim="=" pairdelim=","
The second one tries to "translate" your developers's log structure into JSON using string manipulation.
| rex field=events mode=sed "s/\(/\": {/g s/ *\)/}}/g s/=\s+/=/g s/\s+,/,/g s/(\w+)=([^,}]+)/\"\1\": \"\2\"/g s/\"(true|false)\"/\1/g s/Event/{\"Event/g"
| spath input=events path={}
| fields - events
| mvexpand {}
| spath input={}
| fields - {}
| rename Event.* As *
The second approach is not more robust; if anything, it is less. But it better illustrates the perceived structure.
Either way, your sample data should give you something like
channelCode | contentType | duration | eventNumber | eventTitle | onAir | system | type | utcStartDateTime |
UH | Prog | 00:00:05.000 | 725538339 | BooRadley | true | GRP1-VIPE | NextEvents | 2023-11-17T15:42:10.160Z |
UH | Bumper | 00:00:02.000 | 725538313 | REGGAE-2 | false | GRP1-VIPE | NextEvents | 2023-11-17T15:42:15.160Z |
UH | Commercial | 00:01:30.000 | 725538320 | CHRISITAN MISSION | false | GRP1-VIPE | NextEvents | 2023-11-17T15:42:17.160Z |
This is an emulation you can play with and compare with real data
| makeresults
| eval _raw = "20231117154211 [18080-exec-9] INFO EventConversionService () - SArts: VUpdate(system=GRP1-VIPE, channelCode=UH, type=NextEvents, events=[Event(onAir=true, eventNumber=725538339, utcStartDateTime=2023-11-17T15:42:10.160Z, duration=00:00:05.000, eventTitle=BooRadley, contentType=Prog ), Event(onAir=false, eventNumber=725538313, utcStartDateTime=2023-11-17T15:42:15.160Z, duration=00:00:02.000, eventTitle= REGGAE-2, contentType=Bumper), Event(onAir=false, eventNumber=725538320, utcStartDateTime=2023-11-17T15:42:17.160Z, duration=00:01:30.000, eventTitle=CHRISITAN MISSION , contentType=Commercial)])"
| extract
``` data emulation above ```
Hope this helps.
Hi Yuanlui,
I dont think the devs will change the code!!!
Thank you, option one seems to do the trick.
Its taken me a bit of time to work through the answer to try and understand it and i am still struggling with the sed magic, but will persevere.
thank you again.
The log line is about 9999 characters long with spaces, and not all the log line is ingested - I think i need to create a limits.conf file?
Absolutely. Good data is the only guarantee that any work on it will be valid.
This said, Splunk's KV extraction does not look beyond the first occurrence of key. (And that's a good thing. It is a risky proposition for any language to assume the intention of multiple occurrences of a left-hand side value.) The main problem is caused by the developers, who take pains to invent a structured data that is not standard. It seems that they use foo[] to indicate an array (events), then use bar() to indicate an element; inside element, they use = to separate key and value. Then, on top of this, they use geez() to signal a top level structure ("VUpdate") with key-value pairs that includes the events[] array. If you have any influence over developers, you should urge them, beg them, implore them to use a standard structured representation such as JSON.
If not, you can use Splunk to try to parse out the structure. But this is going to be messy and will never be robust. Unless your developers swear on their descendants' descendants (and their ancestors' ancestors) not to change format, you future can be ruined at their whim.
Before I delve into SPL, I also want to clarify this: Splunk already give you the following fields: channelCode, contentType, duration, eventNumber, eventTitle, events, onAir, system, type, and utcStartDateTime. Is this correct? While you can ignore any second level fields such as eventTitle and eventNumber, I also want to confirm that events includes the whole thing from [ all the way to ]. Is this correct?
I'll suggest two approaches, both rely on the structure I reverse engineered above. The first one is straight string manipulation, and uses Splunk's split function to isolate individual events.
| fields system channelCode channelCode type events
| eval events = split(events, "),")
| mvexpand events
| rename events AS _raw
| rex mode=sed "s/^[\[\s]*Event\(// s/[\)\]]//g"
| kv kvdelim="=" pairdelim=","
The second one tries to "translate" your developers's log structure into JSON using string manipulation.
| rex field=events mode=sed "s/\(/\": {/g s/ *\)/}}/g s/=\s+/=/g s/\s+,/,/g s/(\w+)=([^,}]+)/\"\1\": \"\2\"/g s/\"(true|false)\"/\1/g s/Event/{\"Event/g"
| spath input=events path={}
| fields - events
| mvexpand {}
| spath input={}
| fields - {}
| rename Event.* As *
The second approach is not more robust; if anything, it is less. But it better illustrates the perceived structure.
Either way, your sample data should give you something like
channelCode | contentType | duration | eventNumber | eventTitle | onAir | system | type | utcStartDateTime |
UH | Prog | 00:00:05.000 | 725538339 | BooRadley | true | GRP1-VIPE | NextEvents | 2023-11-17T15:42:10.160Z |
UH | Bumper | 00:00:02.000 | 725538313 | REGGAE-2 | false | GRP1-VIPE | NextEvents | 2023-11-17T15:42:15.160Z |
UH | Commercial | 00:01:30.000 | 725538320 | CHRISITAN MISSION | false | GRP1-VIPE | NextEvents | 2023-11-17T15:42:17.160Z |
This is an emulation you can play with and compare with real data
| makeresults
| eval _raw = "20231117154211 [18080-exec-9] INFO EventConversionService () - SArts: VUpdate(system=GRP1-VIPE, channelCode=UH, type=NextEvents, events=[Event(onAir=true, eventNumber=725538339, utcStartDateTime=2023-11-17T15:42:10.160Z, duration=00:00:05.000, eventTitle=BooRadley, contentType=Prog ), Event(onAir=false, eventNumber=725538313, utcStartDateTime=2023-11-17T15:42:15.160Z, duration=00:00:02.000, eventTitle= REGGAE-2, contentType=Bumper), Event(onAir=false, eventNumber=725538320, utcStartDateTime=2023-11-17T15:42:17.160Z, duration=00:01:30.000, eventTitle=CHRISITAN MISSION , contentType=Commercial)])"
| extract
``` data emulation above ```
Hope this helps.
1) Please show the SPL you've tried and tell us how it failed you. It would help to see an actual (sanitized) event, too.
The options to the extract command are swapped. kvdelim is the character that separates key from value, usually "="; pairdelim is the character that separates kv pairs, usually comma or space.
2) The props.conf file has a TRUNCATE setting that defaults to 10000. Perhaps your system has a lower value.