Splunk Search

Field extraction from source plus custom sourcetype

New Member


I am using Free Version. I would like to use field extraction at (search time or run-time it does not matter) to extract fields from source and put them in other fields. The source is a tar.gz, but I am also customizing it to multiple sourcetypes which I have not settled on the final design. e.g., source /tmp/data/foo-v6.1-diff.tar.gz:./cgi-bin/whatever/foo/my.php. This gets loaded via ./splunk add oneshot /tmp/data/foo-v6.1-diff.tar.gz so there are no monitors.

I have seen at least 5-7 posts in Splunk Answers and I cannot get it to work but it seems like everyone of these approaches should work. In each case, I completely start from scratch so there isn't a precedence issue from that. The source type is always set correctly, I just can't use the fields in any searches.

Attempt 1 using TRANSFORMS props.conf

TRANSFORM-setrnt = trans1, trans2

example of where there would be multiple sources here

sourcetype = phph


SOURCE_KEY = MetaData:Source
REGEX = ^/([a-zA-Z0-9\-\/]*)\/(?<projsite>[a-zA-Z0-9]*)\-v(?<projver>[0-9].[0-9])\-
DEST_KEY = MetaData:Sourcetype
REGEX = (.)
FORMAT = sourcetype::php


INDEXED = true
INDEXED = true

I have also used variations here of:

REGEX = ^/([a-zA-Z0-9\-\/]*)\/(?[a-zA-Z0-9]*)\-v([0-9].[0-9])\-
FORMAT = projsite::$2 projver::$3

Attempt 2 using EXTRACT props.conf

sourcetype = source-php
EXTRACT-sourcefields = ^/([a-zA-Z0-9\-\/]*)\/(?<projsite>[a-zA-Z0-9]*)\-v(?<projver>[0-9].[0-9])\-

Attempt 3 using REPORTS props.conf

sourcetype = source-php
REPORTS-filename = extract-filename


SOURCE_KEY = MetaData:Source
REGEX = "^/([a-zA-Z0-9\-\/]*)\/(?<projsite>[a-zA-Z0-9]*)\-v(?<projver>[0-9].[0-9])\-"

As a side note, I have verified the REGEX syntax independently with pcregextest and by typing it directly in the searchbox.

Tags (1)
0 Karma


I would recommend doing this @ search time via:

## props.conf
REPORT-proj_extract = proj_extract

## transforms.conf
SOURCE_KEY = source
REGEX = ^/([a-zA-Z0-9\-\/]*)\/([a-zA-Z0-9]*)\-v([0-9].[0-9])\-
FORMAT = projsite::$2 projver::$3

In summary it looks like you had a combination of errors in your attempts above:

  3. EXTRACT operates on _raw by default
  4. for REPORT- transforms.conf SOURCE_KEY does not need "MetaData:"

Hope this helps!

0 Karma

New Member

My lack of attention to detail is stunning. Thanks, it worked!

I had to put splunk in –debug mode. Using the TRANSFORMS approach information printed category PropertiesMapConfig and regexExtractionProcessor. I still did not get indexed fields. Using those fixes, the REPORT approach worked. Logging should be increased in this area. There is no logging showing the regex running for the REPORT and with TRANSFORMS the regex succeeded but the indexes did not show up. Also escaping for windows path inconsistent. macros.conf/search rex need extra escape, pcregextest/transforms.conf do not.

0 Karma