Getting Data In

CSV Indexed Extractions in Distributed Environment



Here is my scenario:

  • UF1->
    • UF2->
    • HF-> IDX1;IDX2;IDX3
      • ->SH1

Note: Connections are all good and I have got the files through the chain into my indexers perfectly.

  • The data I'm monitoring is in UF1, where I am monitoring a folder.
  • The files in there are CSV and have fixed fields, so I want them to be automatically extracted. I read in the documentation that INDEXED_EXTRACTIONS would not work with tcp inputs, which are exactly what I am doing, passing data by tcp from splunk instance to splunk instance.

So it appears that I would need only a REPORT to specify which fields are in the Data, bu that would be done in search-time.

- What really are the implications of using INDEXED_EXTRACTIONS ?
- What are the disadvantages of not using INDEXED_EXTRACTIONS in the sourcetype and instead use REPORT?
- If my sourcetype is mainly only about that REPORT, should I just have it defined in the SH? Why would I need it to be somewhere else?

Thanks in advance

0 Karma


Using INDEXED_EXTRACTIONS increases disk usage because it adds the KvP's to the index, it also increases overall overhead of indexing pipeline.

Using REPORT could allow the user to extract as they wish (assuming each user has a private REPORT). Using REPORT at search time puts the load on the search pipeline instead of the indexing pipeline. Most folks would prefer to slow search not indexing.

REPORT only needs to be specified on search heads since it is a search time props setting.

I really dont understand the TCP portion of your equation. If the data is in CSV format on UF1, then you should be able to pull it in with a monitor stanza. When TCP input is mentioned in relation to splunk, they're talking about a TCP stanza in inputs.conf. ie [TCP://9514] ...

And finally i leave you with whats in props.conf.spec:

**Field extraction configuration: index time versus search time**

Use the TRANSFORMS field extraction type to create index-time field
extractions. Use the REPORT or EXTRACT field extraction types to create
search-time field extractions.

NOTE: Index-time field extractions have performance implications. Creating
      additions to Splunk's default set of indexed fields is ONLY
      recommended in specific circumstances.  Whenever possible, extract
      fields only at search time.

**Search-time field extractions: Why use REPORT if EXTRACT will do?**

It's a good question. And much of the time, EXTRACT is all you need for
search-time field extraction. But when you build search-time field
extractions, there are specific cases that require the use of REPORT and the
field transform that it references. Use REPORT if you want to:

* Reuse the same field-extracting regular expression across multiple
  sources, source types, or hosts. If you find yourself using the same regex
  to extract fields across several different sources, source types, and
  hosts, set it up as a transform, and then reference it in REPORT
  extractions in those stanzas. If you need to update the regex you only
  have to do it in one place. Handy!
* Apply more than one field-extracting regular expression to the same
  source, source type, or host. This can be necessary in cases where the
  field or fields that you want to extract from a particular source, source
  type, or host appear in two or more very different event patterns.
* Set up delimiter-based field extractions. Useful if your event data
  presents field-value pairs (or just field values) separated by delimiters
  such as commas, spaces, bars, and so on.
* Configure extractions for multivalued fields. You can have Splunk append
  additional values to a field as it finds them in the event data.
* Extract fields with names beginning with numbers or underscores.
  Ordinarily, Splunk's key cleaning functionality removes leading numeric
  characters and underscores from field names. If you need to keep them,
  configure your field transform to turn key cleaning off.
* Manage formatting of extracted fields, in cases where you are extracting
  multiple fields, or are extracting both the field name and field value.
0 Karma
Get Updates on the Splunk Community!

Splunk Training for All: Meet Aspiring Cybersecurity Analyst, Marc Alicea

Splunk Education believes in the value of training and certification in today’s rapidly-changing data-driven ...

Investigate Security and Threat Detection with VirusTotal and Splunk Integration

As security threats and their complexities surge, security analysts deal with increased challenges and ...

Observability Highlights | January 2023 Newsletter

 January 2023New Product Releases Splunk Network Explorer for Infrastructure MonitoringSplunk unveils Network ...