Field Extraction for different types of data

PramodhKumar · ‎03-08-2020

Hi Splunkers,

Splunk suggests to extract fields at forwarders for structured data, why? and what if i have field names in the log / no filed field names in the log?

I have a confusion that whether my license usage get affected by structured field extraction at index time/ at forwarders.
I understand that splunk license counts against what you index , so if i do indexed field extractions then those field value pairs will be added to _raw and cause license usage, is that correct?

For unstructured data Splunk suggests us to do extraction at search time?.

I'm clear with these but sometimes not,.

any advises will be appreciated..

Pramodh B

valishaik · ‎03-08-2020

what is the splunk DB connetion?

manjunathmeti · ‎03-08-2020

Field extraction settings for structured data must be configured on the forwarder.
If structured data has fields then those are automatically extracted. If not then FIELD_NAMES attribute can be configured in props.conf to set field names.
For structured data all the fields in data are extracted during index time only.
For unstructured data it's better to extract data during search time as Splunk doc says:

####Index-time custom field extraction can degrade performance at both index time and search time. When you add to the number of fields extracted during indexing, the indexing process slows. Later, searches on the index are also slower, because the index has been enlarged by the additional fields, and a search on a larger index takes longer. You can avoid such performance issues by instead relying on search-time field extraction.

PramodhKumar · ‎03-08-2020

Thank you,

I appreciate your efforts here.

why splunk suggests to extarct fields at search time, is this same for structured/unstructured?
What if I do extraction during index-time for unstructured data? license usage?
I didn't understand, this in comparison to your 1st point.
This is clear.

Pramodh

manjunathmeti · ‎03-08-2020

Refer point 4. This is same for both structured and unstructured.
License is measured based on the amount of raw data that the indexer ingests into its indexing pipeline. Basically it is counted against _raw data.
Actually parsing , merging and typing for structured data happens in forwarder only and indexing happens in indexer server.

woodcock · ‎03-08-2020

It has nothing to do with license because you are metered for that with length of _raw in bytes.

First, that guidance is overly-smplistic to the point of being fallacious; please post a followup comment here with the URL where you read that so that I can submit some feedback.

The MAIN reason that this advice is wrong is because it will lead people to the very bad and generally WRONG decision to use Heavy Forwarders (which can do every kind of index-time field extractions) instead of Universal Forwarders: https://www.splunk.com/en_us/blog/tips-and-tricks/universal-or-heavy-that-is-the-question.html

Another reason it is wrong is because index-time field extractions consume a significant amount of disk space, often for no actual benefit (nobody is tstatsing them).

Also, the only sensible way to do index-time field extractions on a Universal Forwarder is with INDEXED_EXTRACTIONS which should generally be avoided because it is "all or none".

The only shred of this advice that is true is the universal distributed architecture rule that, all other considerations being equal (note my previously voiced inequalities above) as much as possible should be done at the leaves of the tree.

PramodhKumar · ‎03-08-2020

Thank you,

All your suggestions are good, I really appreciate your effort.

My question is what does INDEXED_EXTRACTIONS do at UF, lets say I have a csv file having 20 lines, no field names.
I did INDEXED_EXTRACTIONS at UF now what exactly my forwarder sends to Indexer? That's all...

My doubt is if UF forwards _raw + new field vales then while passing throgh index pipeline, does it counts all?

There are 4 pipelines - Parsing-Mergine-Typing-Index , license metered at last pipeline, is that correct when data being written to disk?

Can you please elaborate on precedence of props attributes vs license meter.

Pramodh

woodcock · ‎03-08-2020

There is no impact on license, only disk and CPU.

Field Extraction for different types of data

Introducing the 2024 SplunkTrust!

Introducing the 2024 Splunk MVPs!

Splunk Custom Visualizations App End of Life