Getting Data In

Filtering Events, Creating Custom Fields and Discarding Raw Data at index time

tradevine
Engager

I'm attempting to minimize the amount of data Splunk indexes, but i'm dealing with very large log files. At the moment I can filter the events in these logs based on a regex search to only return the events that I need, however I'd like to shrink the indexed data even further, capturing only one field from the event. This is some contents from a typical log file:

00:22:15.911 - M:ReadByID TradeMeOrganisationWorker,D:0ms,C:1,S:

00:22:32.119 - M:ReadMultiple vwTMNewAutoListing,D:7427ms,C:0,S: at LTI.Services.Concrete....

00:22:34.397 - M:ReadMultiple vwListingQuestion,D:32ms,C:0,S:

The Bold event is the specific M type I'd like to capture (i.e. my Regex search is 'ReadMultiple vwTMNewAutoListing'), however the only information i'm interested in is the Duration (i.e. D:), and discard all the rest of the data - the S: field is a stack-trace and can be quite large.

This is my current config:

PROPS.conf

[hostfile]
pulldown_type = true
SHOULD_LINEMERGE = False
CHECK_FOR_HEADER = false
TRANSFORMS-set = setnull,newautolisting
REPORT-extract = durationMS

TRANSFORMS.conf

[setnull]
REGEX = .
DEST_KEY = queue
FORMAT = nullQueue

[newautolisting]
REGEX = ReadMultiple vwTMNewAutoListing
FORMAT = indexQueue
DEST_KEY = queue

[durationMS]
REGEX = ReadMultiple vwTMNewAutoListing,D:(?<DurationMS>\d+)ms
FORMAT = DurationMS::$1
DEST_KEY = queue

As you can see, I'm sending all the events that do not match the Regex to Null, however the end result is I capture the complete Raw event that matches 'ReadMultiple vwTMNewAutoListing', plus I create the DurationMS field at search time. Is it possible to create the DurationMS field at index-time and discard the rest of the Raw event?

Tags (1)
0 Karma
1 Solution

kristian_kolb
Ultra Champion

Hi, you should normally not make index-time field extractions. Splunk advises against it. What you could do (possibly) is to remove everything after ",S:" via the SEDCMD parameter in props.conf, like so;

props.conf

[hostfile]
SEDCMD-shorten_events = s/,S:.*//g

Please do test this first, since it will throw away a large part of the events. Read more on the subject in;

http://docs.splunk.com/Documentation/Splunk/6.0/Data/Anonymizedatausingconfigurationfiles

Also, for the REPORT transform stanza, I don't think you'll need the 'DEST_KEY'.

/K

View solution in original post

tradevine
Engager

Excellent answer! I was attempting to use non-capturing groups and messing around with the syntax of the format.

This works perfectly for me on the end of my props.conf entry for hostfile:

SEDCMD-trim_start = s/* - M://g
SEDCMD-trim_end = s/ms,C:.*//g
0 Karma

kristian_kolb
Ultra Champion

Hi, you should normally not make index-time field extractions. Splunk advises against it. What you could do (possibly) is to remove everything after ",S:" via the SEDCMD parameter in props.conf, like so;

props.conf

[hostfile]
SEDCMD-shorten_events = s/,S:.*//g

Please do test this first, since it will throw away a large part of the events. Read more on the subject in;

http://docs.splunk.com/Documentation/Splunk/6.0/Data/Anonymizedatausingconfigurationfiles

Also, for the REPORT transform stanza, I don't think you'll need the 'DEST_KEY'.

/K

kristian_kolb
Ultra Champion

Are you removing the timestamps from the events? I'd suggest you keep them (or has _time already been assigned by the time SEDCMD kicks in?)

I think your first SEDCMD should read;

SEDCMD-trim_start = s/ - M://g

/K

0 Karma

tradevine
Engager

Excellent answer! I was attempting to use non-capturing groups and messing around with the syntax of the format.

This works perfectly for me on the end of my props.conf entry for hostfile:

SEDCMD-trim_start = s/* - M://g
SEDCMD-trim_end = s/ms,C:.*//g

Combined with my regex I've brought down my theoretical daily indexing on these particular logs from 1.8Gb (raw) to 4mb!

0 Karma
Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

Why Splunk Customers Should Attend Cisco Live 2026 Las Vegas

Why Splunk Customers Should Attend Cisco Live 2026 Las Vegas     Cisco Live 2026 is almost here, and this ...

What Is the Name of the USB Key Inserted by Bob Smith? (BOTS Hint, Not the Answer)

Hello Splunkers,   So you searched, “what is the name of the usb key inserted by bob smith?”  Not gonna lie… ...

Automating Threat Operations and Threat Hunting with Recorded Future

    Automating Threat Operations and Threat Hunting with Recorded Future June 29, 2026 | Register   Is your ...