Splunk Search

Not all fields in log line extracted

ssaenger
Communicator

Hi, 

I have two problems with a log line.

1)

I have a log line that occasionally is inserted. 
It is a schedule, and i wish to extract the data from it. The entry has values that are
eventTitle=

However, Splunk is only pulling the first occurrence from the log line and ignoring the rest.

so i get;
eventTitle=BooRadley

in my fields, instead of

eventTitle=BooRadley

eventTitle=REGGAE-2

eventTitle=CHRISTIAN MISSION

 

I have tried using regex and | kv pairdelim="=", kvdelim=","

I am unsure if a line break would work as they are referenced to SArts - This is a field extracted via regex and changes.

2)

The log line is about 9999 characters long with spaces, and not all the log line is ingested - I think i need to create a limits.conf file? 

Below is an abridged extract of the log line

 

20231117154211 [18080-exec-9] INFO EventConversionService () - SArts: VUpdate(system=GRP1-VIPE, channelCode=UH, type=NextEvents, events=[Event(onAir=true,  eventNumber=725538339, utcStartDateTime=2023-11-17T15:42:10.160Z, duration=00:00:05.000, eventTitle=BooRadley, contentType=Prog ), Event(onAir=false, eventNumber=725538313, utcStartDateTime=2023-11-17T15:42:15.160Z,  duration=00:00:02.000, eventTitle= REGGAE-2, contentType=Bumper), Event(onAir=false, eventNumber=725538320, utcStartDateTime=2023-11-17T15:42:17.160Z,  duration=00:01:30.000,  eventTitle=CHRISITAN MISSION , contentType=Commercial), Event…

 

This is my code so far;

 

| rex "\-\s+(?<channel_name>.+)\:\sVUpdate"  | stats  values(eventNumber) by channel_name channelCode utcStartDateTime eventTitle duration  

 

Labels (2)
0 Karma
1 Solution

yuanliu
SplunkTrust
SplunkTrust

The log line is about 9999 characters long with spaces, and not all the log line is ingested - I think i need to create a limits.conf file? 

Absolutely.  Good data is the only guarantee that any work on it will be valid.

This said, Splunk's KV extraction does not look beyond the first occurrence of key. (And that's a good thing.  It is a risky proposition for any language to assume the intention of multiple occurrences of a left-hand side value.) The main problem is caused by the developers, who take pains to invent a structured data that is not standard.  It seems that they use foo[] to indicate an array (events), then use bar() to indicate an element; inside element, they use = to separate key and value.  Then, on top of this, they use geez() to signal a top level structure ("VUpdate") with key-value pairs that includes the events[] array.  If you have any influence over developers, you should urge them, beg them, implore them to use a standard structured representation such as JSON.

If not, you can use Splunk to try to parse out the structure.  But this is going to be messy and will never be robust.  Unless your developers swear on their descendants' descendants (and their ancestors' ancestors) not to change format, you future can be ruined at their whim.

Before I delve into SPL, I also want to clarify this: Splunk already give you the following fields: channelCode, contentType, duration, eventNumber, eventTitle, events, onAir, system, type, and utcStartDateTime.  Is this correct?  While you can ignore any second level fields such as eventTitle and eventNumber, I also want to confirm that events includes the whole thing from [ all the way to ].  Is this correct?

I'll suggest two approaches, both rely on the structure I reverse engineered above.  The first one is straight string manipulation, and uses Splunk's split function to isolate individual events.

 

| fields system channelCode channelCode type events
| eval events = split(events, "),")
| mvexpand events
| rename events AS _raw
| rex mode=sed "s/^[\[\s]*Event\(// s/[\)\]]//g"
| kv kvdelim="=" pairdelim=","

 

The second one tries to "translate" your developers's log structure into JSON using string manipulation.

 

| rex field=events mode=sed "s/\(/\": {/g s/ *\)/}}/g s/=\s+/=/g s/\s+,/,/g s/(\w+)=([^,}]+)/\"\1\": \"\2\"/g s/\"(true|false)\"/\1/g s/Event/{\"Event/g"

| spath input=events path={}
| fields - events
| mvexpand {}
| spath input={}
| fields - {}
| rename Event.* As *

 

The second approach is not more robust; if anything, it is less.  But it better illustrates the perceived structure. 

Either way, your sample data should give you something like

channelCodecontentTypedurationeventNumbereventTitleonAirsystemtypeutcStartDateTime
UHProg00:00:05.000725538339BooRadleytrueGRP1-VIPENextEvents2023-11-17T15:42:10.160Z
UHBumper00:00:02.000725538313REGGAE-2falseGRP1-VIPENextEvents2023-11-17T15:42:15.160Z
UHCommercial00:01:30.000725538320CHRISITAN MISSIONfalseGRP1-VIPENextEvents2023-11-17T15:42:17.160Z

This is an emulation you can play with and compare with real data

 

| makeresults
| eval _raw = "20231117154211 [18080-exec-9] INFO EventConversionService () - SArts: VUpdate(system=GRP1-VIPE, channelCode=UH, type=NextEvents, events=[Event(onAir=true,  eventNumber=725538339, utcStartDateTime=2023-11-17T15:42:10.160Z, duration=00:00:05.000, eventTitle=BooRadley, contentType=Prog ), Event(onAir=false, eventNumber=725538313, utcStartDateTime=2023-11-17T15:42:15.160Z,  duration=00:00:02.000, eventTitle= REGGAE-2, contentType=Bumper), Event(onAir=false, eventNumber=725538320, utcStartDateTime=2023-11-17T15:42:17.160Z,  duration=00:01:30.000,  eventTitle=CHRISITAN MISSION , contentType=Commercial)])"
| extract
``` data emulation above ```

 

Hope this helps.

View solution in original post

ssaenger
Communicator

Hi Yuanlui,

I dont think the devs will change the code!!!

Thank you, option one seems to do the trick.
Its taken me a bit of time to work through the answer to try and understand it and i am still struggling with the sed magic, but will persevere.
thank you again.

0 Karma

yuanliu
SplunkTrust
SplunkTrust

The log line is about 9999 characters long with spaces, and not all the log line is ingested - I think i need to create a limits.conf file? 

Absolutely.  Good data is the only guarantee that any work on it will be valid.

This said, Splunk's KV extraction does not look beyond the first occurrence of key. (And that's a good thing.  It is a risky proposition for any language to assume the intention of multiple occurrences of a left-hand side value.) The main problem is caused by the developers, who take pains to invent a structured data that is not standard.  It seems that they use foo[] to indicate an array (events), then use bar() to indicate an element; inside element, they use = to separate key and value.  Then, on top of this, they use geez() to signal a top level structure ("VUpdate") with key-value pairs that includes the events[] array.  If you have any influence over developers, you should urge them, beg them, implore them to use a standard structured representation such as JSON.

If not, you can use Splunk to try to parse out the structure.  But this is going to be messy and will never be robust.  Unless your developers swear on their descendants' descendants (and their ancestors' ancestors) not to change format, you future can be ruined at their whim.

Before I delve into SPL, I also want to clarify this: Splunk already give you the following fields: channelCode, contentType, duration, eventNumber, eventTitle, events, onAir, system, type, and utcStartDateTime.  Is this correct?  While you can ignore any second level fields such as eventTitle and eventNumber, I also want to confirm that events includes the whole thing from [ all the way to ].  Is this correct?

I'll suggest two approaches, both rely on the structure I reverse engineered above.  The first one is straight string manipulation, and uses Splunk's split function to isolate individual events.

 

| fields system channelCode channelCode type events
| eval events = split(events, "),")
| mvexpand events
| rename events AS _raw
| rex mode=sed "s/^[\[\s]*Event\(// s/[\)\]]//g"
| kv kvdelim="=" pairdelim=","

 

The second one tries to "translate" your developers's log structure into JSON using string manipulation.

 

| rex field=events mode=sed "s/\(/\": {/g s/ *\)/}}/g s/=\s+/=/g s/\s+,/,/g s/(\w+)=([^,}]+)/\"\1\": \"\2\"/g s/\"(true|false)\"/\1/g s/Event/{\"Event/g"

| spath input=events path={}
| fields - events
| mvexpand {}
| spath input={}
| fields - {}
| rename Event.* As *

 

The second approach is not more robust; if anything, it is less.  But it better illustrates the perceived structure. 

Either way, your sample data should give you something like

channelCodecontentTypedurationeventNumbereventTitleonAirsystemtypeutcStartDateTime
UHProg00:00:05.000725538339BooRadleytrueGRP1-VIPENextEvents2023-11-17T15:42:10.160Z
UHBumper00:00:02.000725538313REGGAE-2falseGRP1-VIPENextEvents2023-11-17T15:42:15.160Z
UHCommercial00:01:30.000725538320CHRISITAN MISSIONfalseGRP1-VIPENextEvents2023-11-17T15:42:17.160Z

This is an emulation you can play with and compare with real data

 

| makeresults
| eval _raw = "20231117154211 [18080-exec-9] INFO EventConversionService () - SArts: VUpdate(system=GRP1-VIPE, channelCode=UH, type=NextEvents, events=[Event(onAir=true,  eventNumber=725538339, utcStartDateTime=2023-11-17T15:42:10.160Z, duration=00:00:05.000, eventTitle=BooRadley, contentType=Prog ), Event(onAir=false, eventNumber=725538313, utcStartDateTime=2023-11-17T15:42:15.160Z,  duration=00:00:02.000, eventTitle= REGGAE-2, contentType=Bumper), Event(onAir=false, eventNumber=725538320, utcStartDateTime=2023-11-17T15:42:17.160Z,  duration=00:01:30.000,  eventTitle=CHRISITAN MISSION , contentType=Commercial)])"
| extract
``` data emulation above ```

 

Hope this helps.

richgalloway
SplunkTrust
SplunkTrust

1) Please show the SPL you've tried and tell us how it failed you.  It would help to see an actual (sanitized) event, too.

The options to the extract command are swapped.  kvdelim is the character that separates key from value, usually "="; pairdelim is the character that separates kv pairs, usually comma or space.

2) The props.conf file has a TRUNCATE setting that defaults to 10000.  Perhaps your system has a lower value.

---
If this reply helps you, Karma would be appreciated.
0 Karma
Get Updates on the Splunk Community!

Join Us for Splunk University and Get Your Bootcamp Game On!

If you know, you know! Splunk University is the vibe this summer so register today for bootcamps galore ...

.conf24 | Learning Tracks for Security, Observability, Platform, and Developers!

.conf24 is taking place at The Venetian in Las Vegas from June 11 - 14. Continue reading to learn about the ...

Announcing Scheduled Export GA for Dashboard Studio

We're excited to announce the general availability of Scheduled Export for Dashboard Studio. Starting in ...