We have a Universal Forwarder that is sending a huge amount of data. We need to only index events that contain any of these words-- "EnvisionResponse" or "EnvisionRequest" or "TransactionStatusDetail".
The "EnvisionRequest" event is multiple lines so I need all the lines for the event:
here is an example
2017-02-23 12:00:02,982 INFO (http-139.61.194.230-8380-24) EnvisionRequest version="1"
referenceNbr 869dc644e461b01
messageType P
Our Splunk Indexer is version 6.1
Can this be done in the props.conf and transforms.conf on the Indexer without adding to the daily license volume?
ChrisG is correct... you're probably interested in the Keep Specific events and Discard the rest: section of this doc: http://docs.splunk.com/Documentation/Splunk/6.5.2/Forwarding/Routeandfilterdatad#Keep_specific_event...
But it sounds like you're also interested in the mechanics of the indexing pipelines, which can be found as an overview: http://docs.splunk.com/Documentation/Splunk/6.5.2/Indexer/Howindexingworks but I really like the detailed diagram https://wiki.splunk.com/Community:HowIndexingWorks
When data comes from the forwarder, it's first broken into individual lines (parsing pipeline, linebreaking processor)... then those lines are merged into events (Merging pipeline, aggregator processor)... then we apply regexes to the event (Typing pipeline, regex extraction processor). So if your regex matches a single part of your multi-line event, then you'll get all of the lines of the event. (because all lines are a whole event by that point). If you are not getting particular lines, then you'll need to adjust props settings to fix how Splunk is breaking your stream of data ultimately into events.
Does that make sense?
@riotto - Did one of the answers below help provide a solution your question? If yes, please click “Accept” below the best answer to resolve this post and upvote anything that was helpful. If no, please leave a comment with more feedback. Thanks.
ChrisG is correct... you're probably interested in the Keep Specific events and Discard the rest: section of this doc: http://docs.splunk.com/Documentation/Splunk/6.5.2/Forwarding/Routeandfilterdatad#Keep_specific_event...
But it sounds like you're also interested in the mechanics of the indexing pipelines, which can be found as an overview: http://docs.splunk.com/Documentation/Splunk/6.5.2/Indexer/Howindexingworks but I really like the detailed diagram https://wiki.splunk.com/Community:HowIndexingWorks
When data comes from the forwarder, it's first broken into individual lines (parsing pipeline, linebreaking processor)... then those lines are merged into events (Merging pipeline, aggregator processor)... then we apply regexes to the event (Typing pipeline, regex extraction processor). So if your regex matches a single part of your multi-line event, then you'll get all of the lines of the event. (because all lines are a whole event by that point). If you are not getting particular lines, then you'll need to adjust props settings to fix how Splunk is breaking your stream of data ultimately into events.
Does that make sense?
Yes...and that's the answer I was looking for
BUT is ChrisG correct when he says:
You need to use a **heavy forwarder* to filter data before indexing it* ? or can it be done on the indexer itself? If if can then I should be able to achieve volume reduction, right?
Yes, it can be done on the indexer itself. (But if you're using a heavy forwarder locally or intermediate heavy forwarder it has to be done there). It cannot be done on a Universal Forwarder (which means you're sending data over your network to throw it out... which depending on distance and the volume you're throwing away may or may not be a concern)... UNLESS you are using a sourcetype that uses INDEXED_EXTRACTIONS, in which case you have to do it on the UF.
(And yes you will achieve volume reduction in terms of license and in terms of events... the events will not be written to disk and therefore not counted. From the same HowIndexingWorks detailed diagram, see that LicenseVolumeCalculation happens with the index processor in the indexer pipeline, right as data is written... if data isn't written, then it's not counted).
TIL, thanks, acharlieh!
I have it working at the indexer....my concern was the daily license volume
You need to use a heavy forwarder to filter data before indexing it. See Route and filter data in the Forwarding Data manual. You would configure props.conf and transforms.conf on the forwarder.
that documents says:
Although similar to forwarder-based routing, queue routing can be performed by an indexer, as well as a heavy forwarder. It does not use the outputs.conf file, only props.conf and transforms.conf.
You can eliminate unwanted data by routing it to nullQueue, the Splunk equivalent of the Unix /dev/null device. When you filter out data in this way, the data is not forwarded and doesn't count toward your indexing volume.
I'm just not sure what the regex needs to be in the transforms.conf to include the events I want, particularly on the multi line event.
If I use the regex -EnVisionRequest\sversion|EnVisionResponse\sversion|TransactionStatusDetail will I get
all the lines in the multi line event?