Getting Data In

Parsing libsvm events

msivill_splunk
Splunk Employee
Splunk Employee

I'm trying to parse a number of different libsvm files https://www.csie.ntu.edu.tw/~cjlin/libsvm/faq.html#/Q03:_Data_preparation within Splunk but unable to extract all the repeating key:value pairs. I can extract the first value called label correctly however I can only get the first value within the points (repeating key:value pairs). So a field call p_1 is created but p_2, p_3, p_4, p_5, p_6, p_7, and p_8 are missing. I've attached an example of the file to be processed and current props.conf and transforms.conf files.

libsvm file

3.0 1:69.0 2:91.0 3:57.0 4:0.08035548849583785 5:0.01453634825435611 6:25.0 7:19.0 8:-1.3157894736796152 9:0.0
5.0 1:56.0 2:97.0 3:56.0 4:0.8167951447877143 5:-0.06108637426162032 6:23.0 7:19.0 8:2.5 9:0.0
8.0 1:48.0 2:58.0 3:30.0 4:0.3969340706645018 5:0.032656706189642705 6:24.0 7:17.0 8:1.0476190476038028 9:0.0 
10.0 1:27.0 2:69.0 3:20.0 4:0.6876996766857346 5:-0.08096469266049523 6:25.0 7:18.0 8:-4.71999999997206 9:0.0

props.conf file

[libsvm]
DATETIME_CONFIG = CURRENT
NO_BINARY_CHECK = true
category = Custom
disabled = false
pulldown_type = true
SHOULD_LINEMERGE = false
REPORT-a = libsvmLabelledPoints, libsvmPoints

transforms.conf

[libsvmLabelledPoints]
REGEX = (?<label>[^\s]+) (?<points>.*)

[libsvmPoints]
SOURCE_KEY = points
REGEX = \s?([^\s\:]+):([^\s]+)\s?
FORMAT = p$1::$2
REPEAT_MATCH = true

Any pointers on the above would be great.

Tags (1)
0 Karma
1 Solution

msivill_splunk
Splunk Employee
Splunk Employee

Preprocessing the libsvm file with the SEDCMD in props.conf to convert the numeric keys into alphanumeric keys solved this. So the key does not begin with a numeric character i.e. p1, p2, p3, p4, p5, etc.

props.conf

[libsvm]
DATETIME_CONFIG = CURRENT
NO_BINARY_CHECK = true
category = Custom
disabled = false
pulldown_type = true
SHOULD_LINEMERGE = false
SEDCMD-p = s/(\d+:)/p\1/g
REPORT-a = libsvmLabelledPoints, libsvmPoints

transforms.conf

[libsvmLabelledPoints]
REGEX = (?<label>[^\s]+) (?<points>.*)

[libsvmPoints]
FORMAT = $1::$2
REGEX = \s?([^\s\:]+):([^\s]+)\s?
SOURCE_KEY = points

View solution in original post

0 Karma

msivill_splunk
Splunk Employee
Splunk Employee

Preprocessing the libsvm file with the SEDCMD in props.conf to convert the numeric keys into alphanumeric keys solved this. So the key does not begin with a numeric character i.e. p1, p2, p3, p4, p5, etc.

props.conf

[libsvm]
DATETIME_CONFIG = CURRENT
NO_BINARY_CHECK = true
category = Custom
disabled = false
pulldown_type = true
SHOULD_LINEMERGE = false
SEDCMD-p = s/(\d+:)/p\1/g
REPORT-a = libsvmLabelledPoints, libsvmPoints

transforms.conf

[libsvmLabelledPoints]
REGEX = (?<label>[^\s]+) (?<points>.*)

[libsvmPoints]
FORMAT = $1::$2
REGEX = \s?([^\s\:]+):([^\s]+)\s?
SOURCE_KEY = points
0 Karma
Get Updates on the Splunk Community!

Routing logs with Splunk OTel Collector for Kubernetes

The Splunk Distribution of the OpenTelemetry (OTel) Collector is a product that provides a way to ingest ...

Welcome to the Splunk Community!

(view in My Videos) We're so glad you're here! The Splunk Community is place to connect, learn, give back, and ...

Tech Talk | Elevating Digital Service Excellence: The Synergy of Splunk RUM & APM

Elevating Digital Service Excellence: The Synergy of Real User Monitoring and Application Performance ...