Getting Data In

Parsing libsvm events

msivill_splunk
Splunk Employee
Splunk Employee

I'm trying to parse a number of different libsvm files https://www.csie.ntu.edu.tw/~cjlin/libsvm/faq.html#/Q03:_Data_preparation within Splunk but unable to extract all the repeating key:value pairs. I can extract the first value called label correctly however I can only get the first value within the points (repeating key:value pairs). So a field call p_1 is created but p_2, p_3, p_4, p_5, p_6, p_7, and p_8 are missing. I've attached an example of the file to be processed and current props.conf and transforms.conf files.

libsvm file

3.0 1:69.0 2:91.0 3:57.0 4:0.08035548849583785 5:0.01453634825435611 6:25.0 7:19.0 8:-1.3157894736796152 9:0.0
5.0 1:56.0 2:97.0 3:56.0 4:0.8167951447877143 5:-0.06108637426162032 6:23.0 7:19.0 8:2.5 9:0.0
8.0 1:48.0 2:58.0 3:30.0 4:0.3969340706645018 5:0.032656706189642705 6:24.0 7:17.0 8:1.0476190476038028 9:0.0 
10.0 1:27.0 2:69.0 3:20.0 4:0.6876996766857346 5:-0.08096469266049523 6:25.0 7:18.0 8:-4.71999999997206 9:0.0

props.conf file

[libsvm]
DATETIME_CONFIG = CURRENT
NO_BINARY_CHECK = true
category = Custom
disabled = false
pulldown_type = true
SHOULD_LINEMERGE = false
REPORT-a = libsvmLabelledPoints, libsvmPoints

transforms.conf

[libsvmLabelledPoints]
REGEX = (?<label>[^\s]+) (?<points>.*)

[libsvmPoints]
SOURCE_KEY = points
REGEX = \s?([^\s\:]+):([^\s]+)\s?
FORMAT = p$1::$2
REPEAT_MATCH = true

Any pointers on the above would be great.

Tags (1)
0 Karma
1 Solution

msivill_splunk
Splunk Employee
Splunk Employee

Preprocessing the libsvm file with the SEDCMD in props.conf to convert the numeric keys into alphanumeric keys solved this. So the key does not begin with a numeric character i.e. p1, p2, p3, p4, p5, etc.

props.conf

[libsvm]
DATETIME_CONFIG = CURRENT
NO_BINARY_CHECK = true
category = Custom
disabled = false
pulldown_type = true
SHOULD_LINEMERGE = false
SEDCMD-p = s/(\d+:)/p\1/g
REPORT-a = libsvmLabelledPoints, libsvmPoints

transforms.conf

[libsvmLabelledPoints]
REGEX = (?<label>[^\s]+) (?<points>.*)

[libsvmPoints]
FORMAT = $1::$2
REGEX = \s?([^\s\:]+):([^\s]+)\s?
SOURCE_KEY = points

View solution in original post

0 Karma

msivill_splunk
Splunk Employee
Splunk Employee

Preprocessing the libsvm file with the SEDCMD in props.conf to convert the numeric keys into alphanumeric keys solved this. So the key does not begin with a numeric character i.e. p1, p2, p3, p4, p5, etc.

props.conf

[libsvm]
DATETIME_CONFIG = CURRENT
NO_BINARY_CHECK = true
category = Custom
disabled = false
pulldown_type = true
SHOULD_LINEMERGE = false
SEDCMD-p = s/(\d+:)/p\1/g
REPORT-a = libsvmLabelledPoints, libsvmPoints

transforms.conf

[libsvmLabelledPoints]
REGEX = (?<label>[^\s]+) (?<points>.*)

[libsvmPoints]
FORMAT = $1::$2
REGEX = \s?([^\s\:]+):([^\s]+)\s?
SOURCE_KEY = points
0 Karma
Get Updates on the Splunk Community!

[Puzzles] Solve, Learn, Repeat: Unmerging HTML Tables

[Puzzles] Solve, Learn, Repeat: Unmerging HTML TablesFor a previous puzzle, I needed some sample data, and ...

Enterprise Security (ES) Essentials 8.3 is Now GA — Smarter Detections, Faster ...

As of today, Enterprise Security (ES) Essentials 8.3 is now generally available, helping SOC teams simplify ...

AI for AppInspect

We’re excited to announce two new updates to AppInspect designed to save you time and make the app approval ...