Getting Data In

Can splunk ingestion pipeline drop or route older events?

hrawat
Splunk Employee
Splunk Employee

Is there an option to drop older events from the pipeline? Older events can cause frequent bucket rolling and most likely not useful.

Labels (3)
0 Karma
1 Solution

hrawat
Splunk Employee
Splunk Employee

Latest splunk release has new added capability that allows older events to be routed or dropped without configuring any regex expression(e.g. INGEST_EVAL).

With this new capability, filtering happens during date time extraction, very early stage in the pipeline by AggregatorProcessor, it's least expensive way to dop/route older events.

New config ROUTE_EVENTS_OLDER_THAN in props.conf can be added to any stanza along with Timestamp extraction configurations. Right after date/time extraction, by default events can be routed to nulllQueue by setting ROUTE_EVENTS_OLDER_THAN.

ROUTE_EVENTS_OLDER_THAN = <non-negative integer>[s|m|h|d]
* If set, AggregatorProcessor routes events older than 'ROUTE_EVENTS_OLDER_THAN'
  to nullQueue after timestamp extraction.
* Default: no default

Example to drop (route to nullQueue) data that is older than 30 days, set ROUTE_EVENTS_OLDER_THAN=30d

[source::/Applications/splunk/var/spool/splunk]
TIME_PREFIX = \d{4}\/\d{2}\/\d{2} \d{2}:\d{2}:\d{2} \w+\s 
MAX_TIMESTAMP_LOOKAHEAD = 21
ROUTE_EVENTS_OLDER_THAN = 30d
[host::foo]
TIME_PREFIX = \d{4}/\d{2}/\d{2} \d{2}:\d{2}:\d{2} \w+\s
TIME_FORMAT = %b %d %H:%M:%S %Y
ROUTE_EVENTS_OLDER_THAN = 30d

Routing data based on dest routing key. ( index/queue/_TCP_ROUTING /_SYSLOG_ROUTING)
Route data 30 days and older straight to index queue and avoid regex extraction.

[host::foo]
TIME_PREFIX = \d{4}/\d{2}/\d{2} \d{2}:\d{2}:\d{2} \w+\s
TIME_FORMAT = %b %d %H:%M:%S %Y
ROUTE_EVENTS_OLDER_THAN = 30d
queue = indexqueue

Route data 30 days and older to another index

[host::foo]
TIME_PREFIX = \d{4}/\d{2}/\d{2} \d{2}:\d{2}:\d{2} \w+\s
TIME_FORMAT = %b %d %H:%M:%S %Y
ROUTE_EVENTS_OLDER_THAN = 30d
index = <30-days-and-older-index>

Index and forward OR HWF specific settings
Route data 30 days and older to another tcpout group

[host::foo]
TIME_PREFIX = \d{4}/\d{2}/\d{2} \d{2}:\d{2}:\d{2} \w+\s
TIME_FORMAT = %b %d %H:%M:%S %Y
ROUTE_EVENTS_OLDER_THAN = 30d
_TCP_ROUTING = <send to another cluster - OR - another tcpoutgroup>

Route data 30 days and older to another syslog output group

[host::foo]
TIME_PREFIX = \d{4}/\d{2}/\d{2} \d{2}:\d{2}:\d{2} \w+\s
TIME_FORMAT = %b %d %H:%M:%S %Y
ROUTE_EVENTS_OLDER_THAN = 30d
_SYSLOG_ROUTING = <send to another cluster - OR - another tcpoutgroup>



Note :    Above configs are not applicable for HEC event endpoint as these events don't go through date/time extraction. In order to apply above routing rules on HEC events, one of the following option is required. 

1. Use endpoint - services/collector/raw from the source end to undergo parsing and timestamp extraction,
2. You can also extract timestamp using /event endpoint by adding the field  :auto_extract_timestamp=true .

          /services/collector/event?auto_extract_timestamp=true.

Example: http://localhost:8088/services/collector/event?auto_extract_timestamp=true

 

View solution in original post

hrawat
Splunk Employee
Splunk Employee

Latest splunk release has new added capability that allows older events to be routed or dropped without configuring any regex expression(e.g. INGEST_EVAL).

With this new capability, filtering happens during date time extraction, very early stage in the pipeline by AggregatorProcessor, it's least expensive way to dop/route older events.

New config ROUTE_EVENTS_OLDER_THAN in props.conf can be added to any stanza along with Timestamp extraction configurations. Right after date/time extraction, by default events can be routed to nulllQueue by setting ROUTE_EVENTS_OLDER_THAN.

ROUTE_EVENTS_OLDER_THAN = <non-negative integer>[s|m|h|d]
* If set, AggregatorProcessor routes events older than 'ROUTE_EVENTS_OLDER_THAN'
  to nullQueue after timestamp extraction.
* Default: no default

Example to drop (route to nullQueue) data that is older than 30 days, set ROUTE_EVENTS_OLDER_THAN=30d

[source::/Applications/splunk/var/spool/splunk]
TIME_PREFIX = \d{4}\/\d{2}\/\d{2} \d{2}:\d{2}:\d{2} \w+\s 
MAX_TIMESTAMP_LOOKAHEAD = 21
ROUTE_EVENTS_OLDER_THAN = 30d
[host::foo]
TIME_PREFIX = \d{4}/\d{2}/\d{2} \d{2}:\d{2}:\d{2} \w+\s
TIME_FORMAT = %b %d %H:%M:%S %Y
ROUTE_EVENTS_OLDER_THAN = 30d

Routing data based on dest routing key. ( index/queue/_TCP_ROUTING /_SYSLOG_ROUTING)
Route data 30 days and older straight to index queue and avoid regex extraction.

[host::foo]
TIME_PREFIX = \d{4}/\d{2}/\d{2} \d{2}:\d{2}:\d{2} \w+\s
TIME_FORMAT = %b %d %H:%M:%S %Y
ROUTE_EVENTS_OLDER_THAN = 30d
queue = indexqueue

Route data 30 days and older to another index

[host::foo]
TIME_PREFIX = \d{4}/\d{2}/\d{2} \d{2}:\d{2}:\d{2} \w+\s
TIME_FORMAT = %b %d %H:%M:%S %Y
ROUTE_EVENTS_OLDER_THAN = 30d
index = <30-days-and-older-index>

Index and forward OR HWF specific settings
Route data 30 days and older to another tcpout group

[host::foo]
TIME_PREFIX = \d{4}/\d{2}/\d{2} \d{2}:\d{2}:\d{2} \w+\s
TIME_FORMAT = %b %d %H:%M:%S %Y
ROUTE_EVENTS_OLDER_THAN = 30d
_TCP_ROUTING = <send to another cluster - OR - another tcpoutgroup>

Route data 30 days and older to another syslog output group

[host::foo]
TIME_PREFIX = \d{4}/\d{2}/\d{2} \d{2}:\d{2}:\d{2} \w+\s
TIME_FORMAT = %b %d %H:%M:%S %Y
ROUTE_EVENTS_OLDER_THAN = 30d
_SYSLOG_ROUTING = <send to another cluster - OR - another tcpoutgroup>



Note :    Above configs are not applicable for HEC event endpoint as these events don't go through date/time extraction. In order to apply above routing rules on HEC events, one of the following option is required. 

1. Use endpoint - services/collector/raw from the source end to undergo parsing and timestamp extraction,
2. You can also extract timestamp using /event endpoint by adding the field  :auto_extract_timestamp=true .

          /services/collector/event?auto_extract_timestamp=true.

Example: http://localhost:8088/services/collector/event?auto_extract_timestamp=true

 

isoutamo
SplunkTrust
SplunkTrust
This is nice feature and easier to use than using transforms to drop those events. Thanks!
0 Karma
Career Survey
First 500 qualified respondents will receive a $20 gift card! Tell us about your professional Splunk journey.
Get Updates on the Splunk Community!

Tech Talk Recap | Mastering Threat Hunting

Mastering Threat HuntingDive into the world of threat hunting, exploring the key differences between ...

Observability for AI Applications: Troubleshooting Latency

If you’re working with proprietary company data, you’re probably going to have a locally hosted LLM or many ...

Splunk AI Assistant for SPL vs. ChatGPT: Which One is Better?

In the age of AI, every tool promises to make our lives easier. From summarizing content to writing code, ...