Getting Data In

How to filter a large amount of index data being generated by the head index server?

ntripp_element
Explorer

I've noticed the head index server is generating an absurd amount of index data and I want to filter it out

I have a stanza in props:

[host::<hostname>]
 TRANSFORMS-<hostname> = host_setnull

and

[host_setnull]
REGEX = .
DEST_KEY = queue
FORMAT = nullQueue

Is there something else I'm missing? I'm still seeing the events increment.

0 Karma

ntripp_element
Explorer

I'll take that as this should be working then?

0 Karma

somesoni2
Revered Legend

Looks good to me. So this head index server you're referring to, is it your Splunk indexer OR a server which is feeding data to your indexer?

0 Karma

ntripp_element
Explorer

It's my Spunk indexer

0 Karma

somesoni2
Revered Legend

So when you say it's generating absurd amount of indexed data, from where that data is coming from? Is being monitored/generated on your indexer server itself? What sourcetype(s) does that unwanted data of your have?

0 Karma

ntripp_element
Explorer

I thought it must be data from the indexer itself? We're on a trial and hit 100GB today I'm just trying to sort this by the largest volume events that I don't care about and trim this usage into something useful

0 Karma

somesoni2
Revered Legend

I would say you find out which sourcetype or sourcetypes are eating most of your license and then use your nullQueue routing for them. Try running this

index=_internal sourcetype=splunkd component=LicenseUsage type=Usage | stats sum(b) as usage by st | sort 5 -usage | eval usage=round(usage/1024/1024/1024,2)

This will give top 5 sourcetypes based on license usage for selected time range. From this list whatever sourcetypes that you don't want data to be ingested, you can either turn off the monitoring for it (it must be in inputs.conf somewhere) or apply TRANSFORMS for those sourcetypes.

0 Karma

somesoni2
Revered Legend

Where did you apply these changes? It should be done on your heavy/intermediate forwarder OR on indexer, whichever comes first in the data flow. A splunk restart is also required after making the change.

0 Karma

ntripp_element
Explorer

applied on indexer (that's all there is). service was restarted

0 Karma

somesoni2
Revered Legend

Does the <host_name> you put in props.conf matches correctly with host field in the event? Is the head index server a server with forwarder installed on it? What's your environment looks like (topology wise)?

0 Karma

ntripp_element
Explorer

is just a placeholder for the actual hostname that i put in the conf file. We have 1 splunk instance that we are feeding everything to and DCN node for the vmware stuff. So topology wise couldn't be much simpler.

0 Karma
Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

Unlocking Unified Insights: New Gigamon Federated Search App for Splunk

In today’s data-heavy environment, organizations are caught in a data distribution dilemma. As data volumes ...

GA: New Data Management App in Splunk Platform

Streamlining Data Management: Introducing a unified experience in Splunk Managing data at scale shouldn’t feel ...

Announcing Modern Navigation: A New Era of Splunk User Experience

We are excited to introduce the Modern Navigation feature in the Splunk Platform, available to both cloud and ...