Getting Data In
Highlighted

Filtering Events, Creating Custom Fields and Discarding Raw Data at index time

Engager

I'm attempting to minimize the amount of data Splunk indexes, but i'm dealing with very large log files. At the moment I can filter the events in these logs based on a regex search to only return the events that I need, however I'd like to shrink the indexed data even further, capturing only one field from the event. This is some contents from a typical log file:

00:22:15.911 - M:ReadByID TradeMeOrganisationWorker,D:0ms,C:1,S:

00:22:32.119 - M:ReadMultiple vwTMNewAutoListing,D:7427ms,C:0,S: at LTI.Services.Concrete....

00:22:34.397 - M:ReadMultiple vwListingQuestion,D:32ms,C:0,S:

The Bold event is the specific M type I'd like to capture (i.e. my Regex search is 'ReadMultiple vwTMNewAutoListing'), however the only information i'm interested in is the Duration (i.e. D:), and discard all the rest of the data - the S: field is a stack-trace and can be quite large.

This is my current config:

PROPS.conf

[hostfile]
pulldown_type = true
SHOULD_LINEMERGE = False
CHECK_FOR_HEADER = false
TRANSFORMS-set = setnull,newautolisting
REPORT-extract = durationMS

TRANSFORMS.conf

[setnull]
REGEX = .
DEST_KEY = queue
FORMAT = nullQueue

[newautolisting]
REGEX = ReadMultiple vwTMNewAutoListing
FORMAT = indexQueue
DEST_KEY = queue

[durationMS]
REGEX = ReadMultiple vwTMNewAutoListing,D:(?<DurationMS>\d+)ms
FORMAT = DurationMS::$1
DEST_KEY = queue

As you can see, I'm sending all the events that do not match the Regex to Null, however the end result is I capture the complete Raw event that matches 'ReadMultiple vwTMNewAutoListing', plus I create the DurationMS field at search time. Is it possible to create the DurationMS field at index-time and discard the rest of the Raw event?

Tags (1)
0 Karma
Highlighted

Re: Filtering Events, Creating Custom Fields and Discarding Raw Data at index time

Ultra Champion

Hi, you should normally not make index-time field extractions. Splunk advises against it. What you could do (possibly) is to remove everything after ",S:" via the SEDCMD parameter in props.conf, like so;

props.conf

[hostfile]
SEDCMD-shorten_events = s/,S:.*//g

Please do test this first, since it will throw away a large part of the events. Read more on the subject in;

http://docs.splunk.com/Documentation/Splunk/6.0/Data/Anonymizedatausingconfigurationfiles

Also, for the REPORT transform stanza, I don't think you'll need the 'DEST_KEY'.

/K

View solution in original post

Highlighted

Re: Filtering Events, Creating Custom Fields and Discarding Raw Data at index time

Engager

Excellent answer! I was attempting to use non-capturing groups and messing around with the syntax of the format.

This works perfectly for me on the end of my props.conf entry for hostfile:

SEDCMD-trim_start = s/* - M://g
SEDCMD-trim_end = s/ms,C:.*//g

Combined with my regex I've brought down my theoretical daily indexing on these particular logs from 1.8Gb (raw) to 4mb!

0 Karma
Highlighted

Re: Filtering Events, Creating Custom Fields and Discarding Raw Data at index time

Ultra Champion

Are you removing the timestamps from the events? I'd suggest you keep them (or has _time already been assigned by the time SEDCMD kicks in?)

I think your first SEDCMD should read;

SEDCMD-trim_start = s/ - M://g

/K

0 Karma
Highlighted

Re: Filtering Events, Creating Custom Fields and Discarding Raw Data at index time

Engager

Excellent answer! I was attempting to use non-capturing groups and messing around with the syntax of the format.

This works perfectly for me on the end of my props.conf entry for hostfile:

SEDCMD-trim_start = s/* - M://g
SEDCMD-trim_end = s/ms,C:.*//g
0 Karma