Splunk Search

Field extraction from source plus custom sourcetype

New Member

Hello,

I am using Free Version. I would like to use field extraction at (search time or run-time it does not matter) to extract fields from source and put them in other fields. The source is a tar.gz, but I am also customizing it to multiple sourcetypes which I have not settled on the final design. e.g., source /tmp/data/foo-v6.1-diff.tar.gz:./cgi-bin/whatever/foo/my.php. This gets loaded via ./splunk add oneshot /tmp/data/foo-v6.1-diff.tar.gz so there are no monitors.

I have seen at least 5-7 posts in Splunk Answers and I cannot get it to work but it seems like everyone of these approaches should work. In each case, I completely start from scratch so there isn't a precedence issue from that. The source type is always set correctly, I just can't use the fields in any searches.

Attempt 1 using TRANSFORMS props.conf

[source::....php$(.\d+)?]
TRANSFORM-setrnt = trans1, trans2

example of where there would be multiple sources here

[source::....phph$(.\d+)?]
sourcetype = phph

transforms.conf

[trans1]
SOURCE_KEY = MetaData:Source
REGEX = ^/([a-zA-Z0-9\-\/]*)\/(?<projsite>[a-zA-Z0-9]*)\-v(?<projver>[0-9].[0-9])\-
[trans2]
DEST_KEY = MetaData:Sourcetype
REGEX = (.)
FORMAT = sourcetype::php

fields.conf

[projsite]
INDEXED = true
[projver]
INDEXED = true

I have also used variations here of:

WRITE_META=true
REGEX = ^/([a-zA-Z0-9\-\/]*)\/(?[a-zA-Z0-9]*)\-v([0-9].[0-9])\-
FORMAT = projsite::$2 projver::$3

Attempt 2 using EXTRACT props.conf

[source::....php$(.\d+)?]
sourcetype = source-php
EXTRACT-sourcefields = ^/([a-zA-Z0-9\-\/]*)\/(?<projsite>[a-zA-Z0-9]*)\-v(?<projver>[0-9].[0-9])\-

Attempt 3 using REPORTS props.conf

[source::....php$(.\d+)?]
sourcetype = source-php
[source-php]
REPORTS-filename = extract-filename

transforms.conf

[extract-filename]
SOURCE_KEY = MetaData:Source
REGEX = "^/([a-zA-Z0-9\-\/]*)\/(?<projsite>[a-zA-Z0-9]*)\-v(?<projver>[0-9].[0-9])\-"

As a side note, I have verified the REGEX syntax independently with pcregextest and by typing it directly in the searchbox.

Tags (1)
0 Karma

Builder

I would recommend doing this @ search time via:

## props.conf
[source::....php$(.\d+)?]
REPORT-proj_extract = proj_extract

## transforms.conf
[proj_extract]
SOURCE_KEY = source
REGEX = ^/([a-zA-Z0-9\-\/]*)\/([a-zA-Z0-9]*)\-v([0-9].[0-9])\-
FORMAT = projsite::$2 projver::$3

In summary it looks like you had a combination of errors in your attempts above:

  1. TRANSFORMS is TRANSFORMS not TRANSFORM
  2. REPORT is REPORT not REPORTS
  3. EXTRACT operates on _raw by default
  4. for REPORT- transforms.conf SOURCE_KEY does not need "MetaData:"

Hope this helps!

0 Karma

New Member

My lack of attention to detail is stunning. Thanks, it worked!

I had to put splunk in –debug mode. Using the TRANSFORMS approach information printed category PropertiesMapConfig and regexExtractionProcessor. I still did not get indexed fields. Using those fixes, the REPORT approach worked. Logging should be increased in this area. There is no logging showing the regex running for the REPORT and with TRANSFORMS the regex succeeded but the indexes did not show up. Also escaping for windows path inconsistent. macros.conf/search rex need extra escape, pcregextest/transforms.conf do not.

0 Karma