I have a data source of significant size and I want to filter a large percentage of the data on the UF so it isnt sent to the Splunk indexers. How can this be done?
Yes this is possible by using force_local_processing=true
force_local_processing = <boolean>
* Forces a universal forwarder to process all data tagged with this sourcetype
locally before forwarding it to the indexers.
* Data with this sourcetype is processed by the linebreaker,
aggerator, and the regexreplacement processors in addition to the existing
utf8 processor.
* Note that switching this property potentially increases the cpu
and memory consumption of the forwarder.
* Applicable only on a universal forwarder.
* Default: false
You should carefully consider if this option is right for you before deploying it. Read and understand the warning in the spec file (above). By parsing on a UF you are creating a "special snowflake" in your environment where data is parsed somewhere unusual.
Props.conf
[my_sourcetype]
# Use with caution. In most cases its best to let the the parsing occur on a Splunk enterprise server
force_local_processing = true
LINE_BREAKER = ([\r\n]+)
SHOULD_LINEMERGE = false
MAX_TIMESTAMP_LOOKAHEAD = ...
TIME_FORMAT = ...
TIME_PREFIX = ^
TRANSFORMS = my_sourcetype_dump_extra_events
Transforms.conf
[my_sourcetype_dump_extra_events]
REGEX = discard_events_that_match_this_regex
DEST_KEY = queue
FORMAT = nullQueue
Note that if you want to nullqueue/discard all events EXCEPT for those that match a regular expression, the usual documented method won't work (as far as my testing has revealed): https://docs.splunk.com/Documentation/Splunk/latest/Forwarding/Routeandfilterdatad#Filter_event_data...
You will instead need to use a negative assertion REGEX like so:
[my_sourcetype_dump_extra_events]
REGEX = ^((?!keep_events_that_match_this_regex).)*$
DEST_KEY = queue
FORMAT = nullQueue
In my testing, discard events on UF's using force_local_processing and a negative assertion caused no measurable increase in CPU, Memory, Disk IO or Network traffic. I used the below query to check how much data was being sent from the UF to the indexers, and it showed a huge reduction:
| mstats sum(spl.mlog.tcpin_connections.kb) as kb where index=_metrics group="tcpin_connections" fwdType="uf" hostname=UF_NAME span=5m | timechart span=5m sum(kb)
Yes this is possible by using force_local_processing=true
force_local_processing = <boolean>
* Forces a universal forwarder to process all data tagged with this sourcetype
locally before forwarding it to the indexers.
* Data with this sourcetype is processed by the linebreaker,
aggerator, and the regexreplacement processors in addition to the existing
utf8 processor.
* Note that switching this property potentially increases the cpu
and memory consumption of the forwarder.
* Applicable only on a universal forwarder.
* Default: false
You should carefully consider if this option is right for you before deploying it. Read and understand the warning in the spec file (above). By parsing on a UF you are creating a "special snowflake" in your environment where data is parsed somewhere unusual.
Props.conf
[my_sourcetype]
# Use with caution. In most cases its best to let the the parsing occur on a Splunk enterprise server
force_local_processing = true
LINE_BREAKER = ([\r\n]+)
SHOULD_LINEMERGE = false
MAX_TIMESTAMP_LOOKAHEAD = ...
TIME_FORMAT = ...
TIME_PREFIX = ^
TRANSFORMS = my_sourcetype_dump_extra_events
Transforms.conf
[my_sourcetype_dump_extra_events]
REGEX = discard_events_that_match_this_regex
DEST_KEY = queue
FORMAT = nullQueue
Note that if you want to nullqueue/discard all events EXCEPT for those that match a regular expression, the usual documented method won't work (as far as my testing has revealed): https://docs.splunk.com/Documentation/Splunk/latest/Forwarding/Routeandfilterdatad#Filter_event_data...
You will instead need to use a negative assertion REGEX like so:
[my_sourcetype_dump_extra_events]
REGEX = ^((?!keep_events_that_match_this_regex).)*$
DEST_KEY = queue
FORMAT = nullQueue
In my testing, discard events on UF's using force_local_processing and a negative assertion caused no measurable increase in CPU, Memory, Disk IO or Network traffic. I used the below query to check how much data was being sent from the UF to the indexers, and it showed a huge reduction:
| mstats sum(spl.mlog.tcpin_connections.kb) as kb where index=_metrics group="tcpin_connections" fwdType="uf" hostname=UF_NAME span=5m | timechart span=5m sum(kb)