Here is my scenario:
Note: Connections are all good and I have got the files through the chain into my indexers perfectly.
So it appears that I would need only a REPORT to specify which fields are in the Data, bu that would be done in search-time.
- What really are the implications of using INDEXED_EXTRACTIONS ?
- What are the disadvantages of not using INDEXED_EXTRACTIONS in the sourcetype and instead use REPORT?
- If my sourcetype is mainly only about that REPORT, should I just have it defined in the SH? Why would I need it to be somewhere else?
Thanks in advance
Using INDEXED_EXTRACTIONS increases disk usage because it adds the KvP's to the index, it also increases overall overhead of indexing pipeline.
Using REPORT could allow the user to extract as they wish (assuming each user has a private REPORT). Using REPORT at search time puts the load on the search pipeline instead of the indexing pipeline. Most folks would prefer to slow search not indexing.
REPORT only needs to be specified on search heads since it is a search time props setting.
I really dont understand the TCP portion of your equation. If the data is in CSV format on UF1, then you should be able to pull it in with a monitor stanza. When TCP input is mentioned in relation to splunk, they're talking about a TCP stanza in inputs.conf. ie [TCP://9514] ...
And finally i leave you with whats in props.conf.spec:
**Field extraction configuration: index time versus search time** Use the TRANSFORMS field extraction type to create index-time field extractions. Use the REPORT or EXTRACT field extraction types to create search-time field extractions. NOTE: Index-time field extractions have performance implications. Creating additions to Splunk's default set of indexed fields is ONLY recommended in specific circumstances. Whenever possible, extract fields only at search time. **Search-time field extractions: Why use REPORT if EXTRACT will do?** It's a good question. And much of the time, EXTRACT is all you need for search-time field extraction. But when you build search-time field extractions, there are specific cases that require the use of REPORT and the field transform that it references. Use REPORT if you want to: * Reuse the same field-extracting regular expression across multiple sources, source types, or hosts. If you find yourself using the same regex to extract fields across several different sources, source types, and hosts, set it up as a transform, and then reference it in REPORT extractions in those stanzas. If you need to update the regex you only have to do it in one place. Handy! * Apply more than one field-extracting regular expression to the same source, source type, or host. This can be necessary in cases where the field or fields that you want to extract from a particular source, source type, or host appear in two or more very different event patterns. * Set up delimiter-based field extractions. Useful if your event data presents field-value pairs (or just field values) separated by delimiters such as commas, spaces, bars, and so on. * Configure extractions for multivalued fields. You can have Splunk append additional values to a field as it finds them in the event data. * Extract fields with names beginning with numbers or underscores. Ordinarily, Splunk's key cleaning functionality removes leading numeric characters and underscores from field names. If you need to keep them, configure your field transform to turn key cleaning off. * Manage formatting of extracted fields, in cases where you are extracting multiple fields, or are extracting both the field name and field value.