I'm attempting to minimize the amount of data Splunk indexes, but i'm dealing with very large log files. At the moment I can filter the events in these logs based on a regex search to only return the events that I need, however I'd like to shrink the indexed data even further, capturing only one field from the event. This is some contents from a typical log file:
00:22:15.911 - M:ReadByID TradeMeOrganisationWorker,D:0ms,C:1,S:
00:22:32.119 - M:ReadMultiple vwTMNewAutoListing,D:7427ms,C:0,S: at LTI.Services.Concrete....
00:22:34.397 - M:ReadMultiple vwListingQuestion,D:32ms,C:0,S:
The Bold event is the specific M type I'd like to capture (i.e. my Regex search is 'ReadMultiple vwTMNewAutoListing'), however the only information i'm interested in is the Duration (i.e. D:), and discard all the rest of the data - the S: field is a stack-trace and can be quite large.
This is my current config:
PROPS.conf
[hostfile]
pulldown_type = true
SHOULD_LINEMERGE = False
CHECK_FOR_HEADER = false
TRANSFORMS-set = setnull,newautolisting
REPORT-extract = durationMS
TRANSFORMS.conf
[setnull]
REGEX = .
DEST_KEY = queue
FORMAT = nullQueue
[newautolisting]
REGEX = ReadMultiple vwTMNewAutoListing
FORMAT = indexQueue
DEST_KEY = queue
[durationMS]
REGEX = ReadMultiple vwTMNewAutoListing,D:(?<DurationMS>\d+)ms
FORMAT = DurationMS::$1
DEST_KEY = queue
As you can see, I'm sending all the events that do not match the Regex to Null, however the end result is I capture the complete Raw event that matches 'ReadMultiple vwTMNewAutoListing', plus I create the DurationMS field at search time. Is it possible to create the DurationMS field at index-time and discard the rest of the Raw event?
Hi, you should normally not make index-time field extractions. Splunk advises against it. What you could do (possibly) is to remove everything after ",S:" via the SEDCMD parameter in props.conf, like so;
props.conf
[hostfile]
SEDCMD-shorten_events = s/,S:.*//g
Please do test this first, since it will throw away a large part of the events. Read more on the subject in;
http://docs.splunk.com/Documentation/Splunk/6.0/Data/Anonymizedatausingconfigurationfiles
Also, for the REPORT transform stanza, I don't think you'll need the 'DEST_KEY'.
/K
Excellent answer! I was attempting to use non-capturing groups and messing around with the syntax of the format.
This works perfectly for me on the end of my props.conf entry for hostfile:
SEDCMD-trim_start = s/* - M://g
SEDCMD-trim_end = s/ms,C:.*//g
Hi, you should normally not make index-time field extractions. Splunk advises against it. What you could do (possibly) is to remove everything after ",S:" via the SEDCMD parameter in props.conf, like so;
props.conf
[hostfile]
SEDCMD-shorten_events = s/,S:.*//g
Please do test this first, since it will throw away a large part of the events. Read more on the subject in;
http://docs.splunk.com/Documentation/Splunk/6.0/Data/Anonymizedatausingconfigurationfiles
Also, for the REPORT transform stanza, I don't think you'll need the 'DEST_KEY'.
/K
Are you removing the timestamps from the events? I'd suggest you keep them (or has _time
already been assigned by the time SEDCMD kicks in?)
I think your first SEDCMD should read;
SEDCMD-trim_start = s/ - M://g
/K
Excellent answer! I was attempting to use non-capturing groups and messing around with the syntax of the format.
This works perfectly for me on the end of my props.conf entry for hostfile:
SEDCMD-trim_start = s/* - M://g
SEDCMD-trim_end = s/ms,C:.*//g
Combined with my regex I've brought down my theoretical daily indexing on these particular logs from 1.8Gb (raw) to 4mb!