Getting Data In

Can data being sent from a Universal Forwarder be filtered at the indexer level for only certain events?

riotto
Path Finder

We have a Universal Forwarder that is sending a huge amount of data. We need to only index events that contain any of these words-- "EnvisionResponse" or "EnvisionRequest" or "TransactionStatusDetail".

The "EnvisionRequest" event is multiple lines so I need all the lines for the event:
here is an example

2017-02-23 12:00:02,982 INFO   (http-139.61.194.230-8380-24)  EnvisionRequest version="1"
referenceNbr 869dc644e461b01
messageType P

Our Splunk Indexer is version 6.1
Can this be done in the props.conf and transforms.conf on the Indexer without adding to the daily license volume?

0 Karma
1 Solution

acharlieh
Influencer

ChrisG is correct... you're probably interested in the Keep Specific events and Discard the rest: section of this doc: http://docs.splunk.com/Documentation/Splunk/6.5.2/Forwarding/Routeandfilterdatad#Keep_specific_event...

But it sounds like you're also interested in the mechanics of the indexing pipelines, which can be found as an overview: http://docs.splunk.com/Documentation/Splunk/6.5.2/Indexer/Howindexingworks but I really like the detailed diagram https://wiki.splunk.com/Community:HowIndexingWorks

When data comes from the forwarder, it's first broken into individual lines (parsing pipeline, linebreaking processor)... then those lines are merged into events (Merging pipeline, aggregator processor)... then we apply regexes to the event (Typing pipeline, regex extraction processor). So if your regex matches a single part of your multi-line event, then you'll get all of the lines of the event. (because all lines are a whole event by that point). If you are not getting particular lines, then you'll need to adjust props settings to fix how Splunk is breaking your stream of data ultimately into events.

Does that make sense?

View solution in original post

aaraneta_splunk
Splunk Employee
Splunk Employee

@riotto - Did one of the answers below help provide a solution your question? If yes, please click “Accept” below the best answer to resolve this post and upvote anything that was helpful. If no, please leave a comment with more feedback. Thanks.

0 Karma

acharlieh
Influencer

ChrisG is correct... you're probably interested in the Keep Specific events and Discard the rest: section of this doc: http://docs.splunk.com/Documentation/Splunk/6.5.2/Forwarding/Routeandfilterdatad#Keep_specific_event...

But it sounds like you're also interested in the mechanics of the indexing pipelines, which can be found as an overview: http://docs.splunk.com/Documentation/Splunk/6.5.2/Indexer/Howindexingworks but I really like the detailed diagram https://wiki.splunk.com/Community:HowIndexingWorks

When data comes from the forwarder, it's first broken into individual lines (parsing pipeline, linebreaking processor)... then those lines are merged into events (Merging pipeline, aggregator processor)... then we apply regexes to the event (Typing pipeline, regex extraction processor). So if your regex matches a single part of your multi-line event, then you'll get all of the lines of the event. (because all lines are a whole event by that point). If you are not getting particular lines, then you'll need to adjust props settings to fix how Splunk is breaking your stream of data ultimately into events.

Does that make sense?

riotto
Path Finder

Yes...and that's the answer I was looking for

BUT is ChrisG correct when he says:
You need to use a **heavy forwarder* to filter data before indexing it* ? or can it be done on the indexer itself? If if can then I should be able to achieve volume reduction, right?

0 Karma

acharlieh
Influencer

Yes, it can be done on the indexer itself. (But if you're using a heavy forwarder locally or intermediate heavy forwarder it has to be done there). It cannot be done on a Universal Forwarder (which means you're sending data over your network to throw it out... which depending on distance and the volume you're throwing away may or may not be a concern)... UNLESS you are using a sourcetype that uses INDEXED_EXTRACTIONS, in which case you have to do it on the UF.

(And yes you will achieve volume reduction in terms of license and in terms of events... the events will not be written to disk and therefore not counted. From the same HowIndexingWorks detailed diagram, see that LicenseVolumeCalculation happens with the index processor in the indexer pipeline, right as data is written... if data isn't written, then it's not counted).

ChrisG
Splunk Employee
Splunk Employee

TIL, thanks, acharlieh!

0 Karma

riotto
Path Finder

I have it working at the indexer....my concern was the daily license volume

0 Karma

ChrisG
Splunk Employee
Splunk Employee

You need to use a heavy forwarder to filter data before indexing it. See Route and filter data in the Forwarding Data manual. You would configure props.conf and transforms.conf on the forwarder.

riotto
Path Finder

that documents says:
Although similar to forwarder-based routing, queue routing can be performed by an indexer, as well as a heavy forwarder. It does not use the outputs.conf file, only props.conf and transforms.conf.
You can eliminate unwanted data by routing it to nullQueue, the Splunk equivalent of the Unix /dev/null device. When you filter out data in this way, the data is not forwarded and doesn't count toward your indexing volume.

I'm just not sure what the regex needs to be in the transforms.conf to include the events I want, particularly on the multi line event.

If I use the regex -EnVisionRequest\sversion|EnVisionResponse\sversion|TransactionStatusDetail will I get
all the lines in the multi line event?

0 Karma
Get Updates on the Splunk Community!

Automatic Discovery Part 1: What is Automatic Discovery in Splunk Observability Cloud ...

If you’ve ever deployed a new database cluster, spun up a caching layer, or added a load balancer, you know it ...

Real-Time Fraud Detection: How Splunk Dashboards Protect Financial Institutions

Financial fraud isn't slowing down. If anything, it's getting more sophisticated. Account takeovers, credit ...

Splunk + ThousandEyes: Correlate frontend, app, and network data to troubleshoot ...

 Are you tired of troubleshooting delays caused by siloed frontend, application, and network data? We've got a ...