Getting Data In

Parsing libsvm events

msivill_splunk
Splunk Employee
Splunk Employee

I'm trying to parse a number of different libsvm files https://www.csie.ntu.edu.tw/~cjlin/libsvm/faq.html#/Q03:_Data_preparation within Splunk but unable to extract all the repeating key:value pairs. I can extract the first value called label correctly however I can only get the first value within the points (repeating key:value pairs). So a field call p_1 is created but p_2, p_3, p_4, p_5, p_6, p_7, and p_8 are missing. I've attached an example of the file to be processed and current props.conf and transforms.conf files.

libsvm file

3.0 1:69.0 2:91.0 3:57.0 4:0.08035548849583785 5:0.01453634825435611 6:25.0 7:19.0 8:-1.3157894736796152 9:0.0
5.0 1:56.0 2:97.0 3:56.0 4:0.8167951447877143 5:-0.06108637426162032 6:23.0 7:19.0 8:2.5 9:0.0
8.0 1:48.0 2:58.0 3:30.0 4:0.3969340706645018 5:0.032656706189642705 6:24.0 7:17.0 8:1.0476190476038028 9:0.0 
10.0 1:27.0 2:69.0 3:20.0 4:0.6876996766857346 5:-0.08096469266049523 6:25.0 7:18.0 8:-4.71999999997206 9:0.0

props.conf file

[libsvm]
DATETIME_CONFIG = CURRENT
NO_BINARY_CHECK = true
category = Custom
disabled = false
pulldown_type = true
SHOULD_LINEMERGE = false
REPORT-a = libsvmLabelledPoints, libsvmPoints

transforms.conf

[libsvmLabelledPoints]
REGEX = (?<label>[^\s]+) (?<points>.*)

[libsvmPoints]
SOURCE_KEY = points
REGEX = \s?([^\s\:]+):([^\s]+)\s?
FORMAT = p$1::$2
REPEAT_MATCH = true

Any pointers on the above would be great.

Tags (1)
0 Karma
1 Solution

msivill_splunk
Splunk Employee
Splunk Employee

Preprocessing the libsvm file with the SEDCMD in props.conf to convert the numeric keys into alphanumeric keys solved this. So the key does not begin with a numeric character i.e. p1, p2, p3, p4, p5, etc.

props.conf

[libsvm]
DATETIME_CONFIG = CURRENT
NO_BINARY_CHECK = true
category = Custom
disabled = false
pulldown_type = true
SHOULD_LINEMERGE = false
SEDCMD-p = s/(\d+:)/p\1/g
REPORT-a = libsvmLabelledPoints, libsvmPoints

transforms.conf

[libsvmLabelledPoints]
REGEX = (?<label>[^\s]+) (?<points>.*)

[libsvmPoints]
FORMAT = $1::$2
REGEX = \s?([^\s\:]+):([^\s]+)\s?
SOURCE_KEY = points

View solution in original post

0 Karma

msivill_splunk
Splunk Employee
Splunk Employee

Preprocessing the libsvm file with the SEDCMD in props.conf to convert the numeric keys into alphanumeric keys solved this. So the key does not begin with a numeric character i.e. p1, p2, p3, p4, p5, etc.

props.conf

[libsvm]
DATETIME_CONFIG = CURRENT
NO_BINARY_CHECK = true
category = Custom
disabled = false
pulldown_type = true
SHOULD_LINEMERGE = false
SEDCMD-p = s/(\d+:)/p\1/g
REPORT-a = libsvmLabelledPoints, libsvmPoints

transforms.conf

[libsvmLabelledPoints]
REGEX = (?<label>[^\s]+) (?<points>.*)

[libsvmPoints]
FORMAT = $1::$2
REGEX = \s?([^\s\:]+):([^\s]+)\s?
SOURCE_KEY = points
0 Karma
Get Updates on the Splunk Community!

Announcing Scheduled Export GA for Dashboard Studio

We're excited to announce the general availability of Scheduled Export for Dashboard Studio. Starting in ...

Extending Observability Content to Splunk Cloud

Watch Now!   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to leverage ...

More Control Over Your Monitoring Costs with Archived Metrics GA in US-AWS!

What if there was a way you could keep all the metrics data you need while saving on storage costs?This is now ...