Getting Data In

How to filter or extract fields before indexing time

crazyeva
Contributor

I have got very large orginal data, with events strictly formatted as "f1,f2,f3,..."
most of the fields are meaningless: "0,f2,0,0,0,0,f7,0,f9,..." i only want f2,f7,f9
Can I filter fields before indexing, drop unnecessary data, avoid reach license limit?

0 Karma
2 Solutions

Ayn
Legend

Yes, as long as you can formulate a regular expression that defines how Splunk should include or exclude data. You can either to nullQueue routing (= drop events altogether) or

Docs on how to use each: nullQueue routing - http://docs.splunk.com/Documentation/Splunk/6.0/Forwarding/Routeandfilterdatad
SEDCMD - http://docs.splunk.com/Documentation/Splunk/latest/admin/Propsconf - see specification on SEDCMD at the middle of the page.

View solution in original post

kristian_kolb
Ultra Champion

source file

2013-11-01 11:11:11 f1 f2 f3 f4 f5 f6
2013-11-02 13:15:11 d1 d2 d3 d4 d5 d6
2013-11-02 14:23:22 e1 e2 e3 e4 e5 e6
2013-11-03 12:23:21 g1 g2 g3 g4 g5 g6

props.conf

[your_sourcetype]
TRANSFORMS-blah = keep235

transforms.conf

[keep235]
DEST_KEY = _raw
REGEX = ^(\S+)\s+(\S+)\s+(\S+)\s+(\S+)\s+(\S+)\s+(\S+)\s+(\S+)\s+(\S+)
FORMAT = $1 $2 $4 $5 $7

Result:

03/11/2013 12:23:21.000  2013-11-03 12:23:21 g2 g3 g5
02/11/2013 14:23:22.000  2013-11-02 14:23:22 e2 e3 e5

^Splunk parsed timestamp  ^event timestamp     ^less columns/fields

Hope this helps,

/K

View solution in original post

kristian_kolb
Ultra Champion

source file

2013-11-01 11:11:11 f1 f2 f3 f4 f5 f6
2013-11-02 13:15:11 d1 d2 d3 d4 d5 d6
2013-11-02 14:23:22 e1 e2 e3 e4 e5 e6
2013-11-03 12:23:21 g1 g2 g3 g4 g5 g6

props.conf

[your_sourcetype]
TRANSFORMS-blah = keep235

transforms.conf

[keep235]
DEST_KEY = _raw
REGEX = ^(\S+)\s+(\S+)\s+(\S+)\s+(\S+)\s+(\S+)\s+(\S+)\s+(\S+)\s+(\S+)
FORMAT = $1 $2 $4 $5 $7

Result:

03/11/2013 12:23:21.000  2013-11-03 12:23:21 g2 g3 g5
02/11/2013 14:23:22.000  2013-11-02 14:23:22 e2 e3 e5

^Splunk parsed timestamp  ^event timestamp     ^less columns/fields

Hope this helps,

/K

crazyeva
Contributor

Thank you, that's sweet!

0 Karma

Ayn
Legend

Yes, as long as you can formulate a regular expression that defines how Splunk should include or exclude data. You can either to nullQueue routing (= drop events altogether) or

Docs on how to use each: nullQueue routing - http://docs.splunk.com/Documentation/Splunk/6.0/Forwarding/Routeandfilterdatad
SEDCMD - http://docs.splunk.com/Documentation/Splunk/latest/admin/Propsconf - see specification on SEDCMD at the middle of the page.

crazyeva
Contributor

THANK YOU!
I have read some of related answers, and thought SEDCMD was just able to drop the content after a tag, nullQueue was "at event level"
Than i have to dig into "sed" tool. is there a awkcmd?

0 Karma
Get Updates on the Splunk Community!

A Prelude to .conf25: Your Guide to Splunk University

Heading to Boston this September for .conf25? Get a jumpstart by arriving a few days early for Splunk ...

4 Ways the Splunk Community Helps You Prepare for .conf25

.conf25 is right around the corner, and whether you’re a first-time attendee or a seasoned Splunker, the ...

Enhance Your Splunk App Development: New Tools & Support

UCC FrameworkAdd-on Builder has been around for quite some time. It helps build Splunk apps faster, but it ...