Getting Data In

Parsing libsvm events

msivill_splunk
Splunk Employee
Splunk Employee

I'm trying to parse a number of different libsvm files https://www.csie.ntu.edu.tw/~cjlin/libsvm/faq.html#/Q03:_Data_preparation within Splunk but unable to extract all the repeating key:value pairs. I can extract the first value called label correctly however I can only get the first value within the points (repeating key:value pairs). So a field call p_1 is created but p_2, p_3, p_4, p_5, p_6, p_7, and p_8 are missing. I've attached an example of the file to be processed and current props.conf and transforms.conf files.

libsvm file

3.0 1:69.0 2:91.0 3:57.0 4:0.08035548849583785 5:0.01453634825435611 6:25.0 7:19.0 8:-1.3157894736796152 9:0.0
5.0 1:56.0 2:97.0 3:56.0 4:0.8167951447877143 5:-0.06108637426162032 6:23.0 7:19.0 8:2.5 9:0.0
8.0 1:48.0 2:58.0 3:30.0 4:0.3969340706645018 5:0.032656706189642705 6:24.0 7:17.0 8:1.0476190476038028 9:0.0 
10.0 1:27.0 2:69.0 3:20.0 4:0.6876996766857346 5:-0.08096469266049523 6:25.0 7:18.0 8:-4.71999999997206 9:0.0

props.conf file

[libsvm]
DATETIME_CONFIG = CURRENT
NO_BINARY_CHECK = true
category = Custom
disabled = false
pulldown_type = true
SHOULD_LINEMERGE = false
REPORT-a = libsvmLabelledPoints, libsvmPoints

transforms.conf

[libsvmLabelledPoints]
REGEX = (?<label>[^\s]+) (?<points>.*)

[libsvmPoints]
SOURCE_KEY = points
REGEX = \s?([^\s\:]+):([^\s]+)\s?
FORMAT = p$1::$2
REPEAT_MATCH = true

Any pointers on the above would be great.

Tags (1)
0 Karma
1 Solution

msivill_splunk
Splunk Employee
Splunk Employee

Preprocessing the libsvm file with the SEDCMD in props.conf to convert the numeric keys into alphanumeric keys solved this. So the key does not begin with a numeric character i.e. p1, p2, p3, p4, p5, etc.

props.conf

[libsvm]
DATETIME_CONFIG = CURRENT
NO_BINARY_CHECK = true
category = Custom
disabled = false
pulldown_type = true
SHOULD_LINEMERGE = false
SEDCMD-p = s/(\d+:)/p\1/g
REPORT-a = libsvmLabelledPoints, libsvmPoints

transforms.conf

[libsvmLabelledPoints]
REGEX = (?<label>[^\s]+) (?<points>.*)

[libsvmPoints]
FORMAT = $1::$2
REGEX = \s?([^\s\:]+):([^\s]+)\s?
SOURCE_KEY = points

View solution in original post

0 Karma

msivill_splunk
Splunk Employee
Splunk Employee

Preprocessing the libsvm file with the SEDCMD in props.conf to convert the numeric keys into alphanumeric keys solved this. So the key does not begin with a numeric character i.e. p1, p2, p3, p4, p5, etc.

props.conf

[libsvm]
DATETIME_CONFIG = CURRENT
NO_BINARY_CHECK = true
category = Custom
disabled = false
pulldown_type = true
SHOULD_LINEMERGE = false
SEDCMD-p = s/(\d+:)/p\1/g
REPORT-a = libsvmLabelledPoints, libsvmPoints

transforms.conf

[libsvmLabelledPoints]
REGEX = (?<label>[^\s]+) (?<points>.*)

[libsvmPoints]
FORMAT = $1::$2
REGEX = \s?([^\s\:]+):([^\s]+)\s?
SOURCE_KEY = points
0 Karma
Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

Mile High Learning with Splunk University, Denver, Colorado

If Denver is known for its mile-high elevation, Splunk University is about to raise the bar on technical ...

IT Service Intelligence 5.0 Series: Your Guide to the June Launch

We are excited to announce the June release of Splunk IT Service Intelligence (ITSI) 5.0. This update ...

Agent Mode Engaged! Enchaining Agentic Operations with Splunk AI Assistant 2.0

    Are you ready to transform how your team handles complex data requests? We invite you to our upcoming ...