Getting Data In

How would you determine if data is pre-parsed?

rmavery
Explorer

We have three (Windows 2008 R2) domain controllers sending events to a single Splunk collector.
We need to reduce our indexing, so some of the events are being directed to the 'NullQ'

Here are my configs (excerpts)...

PROPS.CONF

[source::WinEventLog:Security]
TRANSFORMS-null = null_excluded_events, null_seqid, null_user

TRANSFORMS.CONF

[null_excluded_events]
case_sensitive_match = false
REGEX = (?mi)EventCode=(5145|4770|4634|4768|4769|5136)
DEST_KEY = queue
FORMAT = nullQueue

In Search, I test using...

  • | regex "(?mi)EventCode=(5145|4770|4634|4768|4769|5136)"

We discovered that Splunk will treat data coming from a 'Heavy Forwarder' as pre-parsed and thereby bypass the config files that would otherwise have redirected the data to the 'NullQ'

I went back to all of the domain controllers, and turned them into 'Lightforwarders'

Manager >> Forwarding and Receiving >> Enable Light Forwarding.

All Splunkd have been restarted, and I confirmed that the Splunk web is no longer working on these machines.

I have noticed that the events are apparently being successfully redirected from one of the domain controllers, but not the other two. Is there any mechanism for identifying which data Splunk sees as coming from a Heavy Forwarder vs Light Forwarder, or some alternative method of tracking this issue down?

1 Solution

lguinn2
Legend

First, I would suggest using the Universal Forwarder (UF). It's smaller and faster. More importantly, the UF cannot parse, so there is no possibility of the data being parsed before it is forwarded! The configuration files will be identical for inputs.conf and outputs.conf, so you can just copy them over from the heavy or light forwarder. You should also use props.conf on the UF if you need to override automatic sourcetyping.

Especially on a DC, I would always go with a UF due to the lighter footprint. But I recommend that you use the UF throughout your environment, except in the unusual case that you need to parse before forwarding.

You will find the download for the Universal Forwarder on a separate tab in the Splunk Download window, as it is a completely separate binary.

You can see how the data is arriving with the following search:

index=_internal source=*metrics.log group=tcpin_connections

This search summarizes the data by hour:

index=_internal source=*metrics.log group=tcpin_connections 
| eval sourceHost=if(isnull(hostname), sourceHost,hostname) 
| rename connectionType as connectType
| eval connectType=case(fwdType=="uf","univ fwder", fwdType=="lwf", "lightwt fwder",fwdType=="full", "heavy fwder", connectType=="cooked" or connectType=="cookedSSL","Splunk fwder", connectType=="raw" or connectType=="rawSSL","legacy fwder")
| eval version=if(isnull(version),"pre 4.2",version)
| rename version as Ver 
| fields connectType sourceIp sourceHost destPort kb tcp_eps tcp_Kprocessed tcp_KBps splunk_server Ver
| eval Indexer= splunk_server
| eval Hour=relative_time(_time,"@h")
| stats avg(tcp_KBps) sum(tcp_eps) sum(tcp_Kprocessed) sum(kb) by Hour connectType sourceIp sourceHost destPort Indexer Ver
| fieldformat Hour=strftime(Hour,"%x %H")

View solution in original post

lguinn2
Legend

First, I would suggest using the Universal Forwarder (UF). It's smaller and faster. More importantly, the UF cannot parse, so there is no possibility of the data being parsed before it is forwarded! The configuration files will be identical for inputs.conf and outputs.conf, so you can just copy them over from the heavy or light forwarder. You should also use props.conf on the UF if you need to override automatic sourcetyping.

Especially on a DC, I would always go with a UF due to the lighter footprint. But I recommend that you use the UF throughout your environment, except in the unusual case that you need to parse before forwarding.

You will find the download for the Universal Forwarder on a separate tab in the Splunk Download window, as it is a completely separate binary.

You can see how the data is arriving with the following search:

index=_internal source=*metrics.log group=tcpin_connections

This search summarizes the data by hour:

index=_internal source=*metrics.log group=tcpin_connections 
| eval sourceHost=if(isnull(hostname), sourceHost,hostname) 
| rename connectionType as connectType
| eval connectType=case(fwdType=="uf","univ fwder", fwdType=="lwf", "lightwt fwder",fwdType=="full", "heavy fwder", connectType=="cooked" or connectType=="cookedSSL","Splunk fwder", connectType=="raw" or connectType=="rawSSL","legacy fwder")
| eval version=if(isnull(version),"pre 4.2",version)
| rename version as Ver 
| fields connectType sourceIp sourceHost destPort kb tcp_eps tcp_Kprocessed tcp_KBps splunk_server Ver
| eval Indexer= splunk_server
| eval Hour=relative_time(_time,"@h")
| stats avg(tcp_KBps) sum(tcp_eps) sum(tcp_Kprocessed) sum(kb) by Hour connectType sourceIp sourceHost destPort Indexer Ver
| fieldformat Hour=strftime(Hour,"%x %H")

lguinn2
Legend

Thanks for the points!

0 Karma

rmavery
Explorer

Wow lguinn, this was a great answer, and the queries worked perfectly. I wish I had more Karma points. I would love to have awarded more points to this answer. Thank you. The information was invaluable.

0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...