Hi,
Here is my scenario:
Note: Connections are all good and I have got the files through the chain into my indexers perfectly.
So it appears that I would need only a REPORT to specify which fields are in the Data, bu that would be done in search-time.
QUESTIONS:
- What really are the implications of using INDEXED_EXTRACTIONS ?
- What are the disadvantages of not using INDEXED_EXTRACTIONS in the sourcetype and instead use REPORT?
- If my sourcetype is mainly only about that REPORT, should I just have it defined in the SH? Why would I need it to be somewhere else?
Thanks in advance
Using INDEXED_EXTRACTIONS increases disk usage because it adds the KvP's to the index, it also increases overall overhead of indexing pipeline.
Using REPORT could allow the user to extract as they wish (assuming each user has a private REPORT). Using REPORT at search time puts the load on the search pipeline instead of the indexing pipeline. Most folks would prefer to slow search not indexing.
REPORT only needs to be specified on search heads since it is a search time props setting.
I really dont understand the TCP portion of your equation. If the data is in CSV format on UF1, then you should be able to pull it in with a monitor stanza. When TCP input is mentioned in relation to splunk, they're talking about a TCP stanza in inputs.conf. ie [TCP://9514] ...
And finally i leave you with whats in props.conf.spec:
**Field extraction configuration: index time versus search time**
Use the TRANSFORMS field extraction type to create index-time field
extractions. Use the REPORT or EXTRACT field extraction types to create
search-time field extractions.
NOTE: Index-time field extractions have performance implications. Creating
additions to Splunk's default set of indexed fields is ONLY
recommended in specific circumstances. Whenever possible, extract
fields only at search time.
**Search-time field extractions: Why use REPORT if EXTRACT will do?**
It's a good question. And much of the time, EXTRACT is all you need for
search-time field extraction. But when you build search-time field
extractions, there are specific cases that require the use of REPORT and the
field transform that it references. Use REPORT if you want to:
* Reuse the same field-extracting regular expression across multiple
sources, source types, or hosts. If you find yourself using the same regex
to extract fields across several different sources, source types, and
hosts, set it up as a transform, and then reference it in REPORT
extractions in those stanzas. If you need to update the regex you only
have to do it in one place. Handy!
* Apply more than one field-extracting regular expression to the same
source, source type, or host. This can be necessary in cases where the
field or fields that you want to extract from a particular source, source
type, or host appear in two or more very different event patterns.
* Set up delimiter-based field extractions. Useful if your event data
presents field-value pairs (or just field values) separated by delimiters
such as commas, spaces, bars, and so on.
* Configure extractions for multivalued fields. You can have Splunk append
additional values to a field as it finds them in the event data.
* Extract fields with names beginning with numbers or underscores.
Ordinarily, Splunk's key cleaning functionality removes leading numeric
characters and underscores from field names. If you need to keep them,
configure your field transform to turn key cleaning off.
* Manage formatting of extracted fields, in cases where you are extracting
multiple fields, or are extracting both the field name and field value.