Getting Data In

Parsing libsvm events

msivill_splunk
Splunk Employee
Splunk Employee

I'm trying to parse a number of different libsvm files https://www.csie.ntu.edu.tw/~cjlin/libsvm/faq.html#/Q03:_Data_preparation within Splunk but unable to extract all the repeating key:value pairs. I can extract the first value called label correctly however I can only get the first value within the points (repeating key:value pairs). So a field call p_1 is created but p_2, p_3, p_4, p_5, p_6, p_7, and p_8 are missing. I've attached an example of the file to be processed and current props.conf and transforms.conf files.

libsvm file

3.0 1:69.0 2:91.0 3:57.0 4:0.08035548849583785 5:0.01453634825435611 6:25.0 7:19.0 8:-1.3157894736796152 9:0.0
5.0 1:56.0 2:97.0 3:56.0 4:0.8167951447877143 5:-0.06108637426162032 6:23.0 7:19.0 8:2.5 9:0.0
8.0 1:48.0 2:58.0 3:30.0 4:0.3969340706645018 5:0.032656706189642705 6:24.0 7:17.0 8:1.0476190476038028 9:0.0 
10.0 1:27.0 2:69.0 3:20.0 4:0.6876996766857346 5:-0.08096469266049523 6:25.0 7:18.0 8:-4.71999999997206 9:0.0

props.conf file

[libsvm]
DATETIME_CONFIG = CURRENT
NO_BINARY_CHECK = true
category = Custom
disabled = false
pulldown_type = true
SHOULD_LINEMERGE = false
REPORT-a = libsvmLabelledPoints, libsvmPoints

transforms.conf

[libsvmLabelledPoints]
REGEX = (?<label>[^\s]+) (?<points>.*)

[libsvmPoints]
SOURCE_KEY = points
REGEX = \s?([^\s\:]+):([^\s]+)\s?
FORMAT = p$1::$2
REPEAT_MATCH = true

Any pointers on the above would be great.

Tags (1)
0 Karma
1 Solution

msivill_splunk
Splunk Employee
Splunk Employee

Preprocessing the libsvm file with the SEDCMD in props.conf to convert the numeric keys into alphanumeric keys solved this. So the key does not begin with a numeric character i.e. p1, p2, p3, p4, p5, etc.

props.conf

[libsvm]
DATETIME_CONFIG = CURRENT
NO_BINARY_CHECK = true
category = Custom
disabled = false
pulldown_type = true
SHOULD_LINEMERGE = false
SEDCMD-p = s/(\d+:)/p\1/g
REPORT-a = libsvmLabelledPoints, libsvmPoints

transforms.conf

[libsvmLabelledPoints]
REGEX = (?<label>[^\s]+) (?<points>.*)

[libsvmPoints]
FORMAT = $1::$2
REGEX = \s?([^\s\:]+):([^\s]+)\s?
SOURCE_KEY = points

View solution in original post

0 Karma

msivill_splunk
Splunk Employee
Splunk Employee

Preprocessing the libsvm file with the SEDCMD in props.conf to convert the numeric keys into alphanumeric keys solved this. So the key does not begin with a numeric character i.e. p1, p2, p3, p4, p5, etc.

props.conf

[libsvm]
DATETIME_CONFIG = CURRENT
NO_BINARY_CHECK = true
category = Custom
disabled = false
pulldown_type = true
SHOULD_LINEMERGE = false
SEDCMD-p = s/(\d+:)/p\1/g
REPORT-a = libsvmLabelledPoints, libsvmPoints

transforms.conf

[libsvmLabelledPoints]
REGEX = (?<label>[^\s]+) (?<points>.*)

[libsvmPoints]
FORMAT = $1::$2
REGEX = \s?([^\s\:]+):([^\s]+)\s?
SOURCE_KEY = points
0 Karma
Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

Announcing Modern Navigation: A New Era of Splunk User Experience

We are excited to introduce the Modern Navigation feature in the Splunk Platform, available to both cloud and ...

Observability Simplified: Combining User Experience, Application Performance & ...

Tech Talk Observability Simplified: Combining User Experience, Application Performance & Network ...

Event Series May & June: From Network Visibility to Service Intelligence

Unifying the Network: Moving from Alert Noise to Service Intelligence with Splunk ITSI In today’s hybrid ...