Hi,
I've recently noticed the recommendations the move to search-time versus index-time field extractions. I'm trying to get an idea of exactly how much of the configuration that we've got in place doesn't follow this paradigm. We especially have a lot of DELIMS/FIELDS-based field extractions, and I'm not clear on where we stand with these, especially since there's no obvious way to configure them in the GUI.
I'm assuming when an extract says 'uses transform' as opposed to 'inline' in the GUI then it is an index-time field extraction? Is this the case or am I oversimplifying the distinction?
I've looked over the documentation on search-time indexing and http://www.splunk.com/base/Documentation/latest/Knowledge/Addfieldsatsearchtime says:
You can also create and maintain field extractions by making edits directly to props.conf and transforms.conf. If this sounds like your kind of thing--and it may be, especially if you are an old-timey Splunk user, or just prefer working at the configuration file level of things, you can find all the details in "Create and maintain search-time extractions through configuration files," in this manual.
This being said, other documentation at http://www.splunk.com/base/Splexicon:Transform says:
Transforms are always involved in the setup of custom index-time field extractions.
Can somebody please help us clear this up? Thanks!
-Frank
In general, we recommend search-time extractions rather than index-time extractions. There are relatively few cases where index-time extractions are better, and they come at the cost of brittleness of configuration and an increase in index size (which in turn makes searches slower).
The distinction in the UI of "uses transform" vs. inline doesn't have anything to do with search-time vs index-time. It is referring to where the regex itself is stored: in an EXTRACT-
line in props.conf (for inline) as opposed to in a REPORT-
line that refers to a stanza in transforms.conf (for uses transform).
Index time extractions are also set in props.conf and transforms.conf by means of the TRANSFORM-
line. Again, they should rarely be used. They are appropriate when the heuristic of search for the value of the field fails (either because the value is ubiquitous outside of cases where the field equals the value, or because the value isn't an indexed token) or when you commonly search for field!=value
without other terms to constrain the search.
There are 2 different transform
things.
One is transforms.conf
which contains transform
definitions and the word transform
only occurs in the file name, not in the contents of the file. That is one thing.
Then there is the TRANFORMS-
definition inside of props.conf
that is part of the REPORT-
, EXTRACT-
, and TRANSFORMS-
triad. The first two are search-time
things that are really the same thing (just that REPORT-
definitions will reference transforms
defined in transforms.conf
whereas EXTRACT-
definitions are inlined completely in props.conf
). The last, TRANSFORMS-
is how index-time
extractions are configured.
I agree that this is a bit confusing.
I will clarify here that DELIMS/FIELDS extraction are search-time extractions, and thus of the preferred type already.
Here is a related discussion (which highlights some additional use-cases for using indexed fields)
In general, we recommend search-time extractions rather than index-time extractions. There are relatively few cases where index-time extractions are better, and they come at the cost of brittleness of configuration and an increase in index size (which in turn makes searches slower).
The distinction in the UI of "uses transform" vs. inline doesn't have anything to do with search-time vs index-time. It is referring to where the regex itself is stored: in an EXTRACT-
line in props.conf (for inline) as opposed to in a REPORT-
line that refers to a stanza in transforms.conf (for uses transform).
Index time extractions are also set in props.conf and transforms.conf by means of the TRANSFORM-
line. Again, they should rarely be used. They are appropriate when the heuristic of search for the value of the field fails (either because the value is ubiquitous outside of cases where the field equals the value, or because the value isn't an indexed token) or when you commonly search for field!=value
without other terms to constrain the search.
Does this "recommend" still stand nearly 8 years later?
@wmyersas I think it's much more recommended now that Splunk is moving to "compute" rather than daily volume type of billing customers. Search time extractions will defo use more compute to load into RAM rather than displaying fields that have already been burned onto the disk.
@morethanyell >>>Search time extractions will defo use more compute to load into RAM rather than displaying fields that have already been burned onto the disk<<<
so, do you suggest index-time field extraction, than the search-time field extraction?
I don't. I think it's a good architectural design to be able to determine which context must implement indexed fields and which context must resort to the default search-time field extractions.
For example, TAs that make logs CIM compliant like TAs for network devices (e.g. Aruba or Checkpoint), they extract fields at search time and are best to remain that way because of the sheer size of network devices logs. Also, they are most likely going to be accelerated anyway in a sort of "Network Traffic" `datamodel`.
Another example is for specific contexts that could help your unique-to-your-organisation use cases. I used to work in a project where we had to implement index-time field extractions so we can pull them quickly in `tstats` so we can display the data quickly on a dashboard that's viewed by a high-profile boss every Monday. He gets frustrated if the dashboard loads very slowly and we don't want to upset the boss. We could've used accel-datamodel but we decided to do it at indexing phase instead.
Thanks, that's exactly what I was hoping to hear. Now, if we could just get an easy way to configure DELIMS/FIELDS in the UI, I'd be even happier...
Yup, still waiting on the DELIMS/FIELDS UI thing in 2016. And now with Splunk Cloud that's become an even bigger pain because of the lack of access to the .conf files. ;-(