Index time: default fields (timestamp, punct, host, source, and sourcetype) + whatever you configure to be extracted during index time
Search time: whatever you configure via the GUI + props/transforms configured for search time extraction
Take a look at this too: http://docs.splunk.com/Documentation/Splunk/6.3.2/Indexer/Indextimeversussearchtime
Unfortunately, knowing exactly where a field comes from can be quite difficult to track down.
When you look at the list of fields on the left-hand bar, an individual field could come from any of these:
Before structured data extractions you could generally assume that all of the (non-default) fields came from a search-time field extraction (or one of the other search-time methods listed above), but that's not always the case anymore.
So all that to say there's no "easy" answer. I think the best approach is to ask the question one field at at time. You can do that with
tstats, because it searches the index directly and therefore will therefore completely ignore search-time extracted fields.
Let's say you suspect that
foo is an indexed field. Assuming that foo shows up with the value of
bar. So lets just setup a baseline search that will show us how many times "foo" equals "bar" for whatever index and time range your testing.
foo=bar | stats count
Now run the tstats version and see if you get the same results:
| tstats count where foo=bar
Or, another option (if
tstats scares you -- I had forgotten that this still works.):
foo::bar | stats count
If you get "0", then the field isn't indexed; so it must be auto extracted or something... the point is that it's happening at search time; not at index time. If both searches return the same count, then you know that "foo" is always an indexed field. (If the numbers are slightly off, it either means that the field is only sometimes indexed, or more likely, it just means that data moved between the time you ran the two searches. (Try using a historic timerange that doesn't go up until "now")
Here's a few other things you can look at when trying to determine if a field is indexed or not:
fields.conflook for stanzas with
INDEXEDis true. (But this isn't a guarantee.)
walklexto probe individual
*.tsidxfiles in your buckets. (This is very low-level, very tedious unless your a Splunk Ninja; but it's the ultimate source of truth)
.conffiles at once for the field name in question. Normally returns something relevant, unless your field name is also a commonly occurring term.
Thanks for this complete answer. Using the walklex tool, I could confirm that I still have a bunch of field that should not be extracted at index-time. I just need to find why now...
Hey, just found out that Splunk revived their
field::value search syntax (which I think was disabled) back in Splunk 5, but I guess made it's way back into Splunk 6.
Run your search in
fast mode (not
smart). Whatever fields are present, were added at index time (more or less).
Unfortunately, that's not correct. Most of the fields that show up are in-fact indexed fields (except for splunkserver and linecount), but there's way more indexed fields than what is being shown.) For example, date*, timestartpos, punct, ...)
I just double checked on a local 6.2 instance running on my laptop. I indexed some IIS data using the structured data extraction, and confirmed that "cip" didn't show up on the left, but if I searched for `cip::22.214.171.124` it did in fact return matching records. (Tested with tstats too, same result)
Fast mode is "fast" because it doesn't bother extracting fields you haven't requested. But Splunk still has to look at both the raw data and the index data to fetch every event. Vs something like
tstats which does a pure index-only search never needs to pull in the raw data (and therefore search-time extractions are impossible to perform). Both approaches are faster than the other search modes, but they are very different under the covers.
Here's an idea, based on Lowell's answer above. Create a list of fields from events (
|stats values(*) as * ) and feed it to
map to test whether
field::value works - implying it's at least a pseudo-indexed field.
index=youridx | dedup 25 sourcetype
get some events, assuming 25 per sourcetype is enough to get all field names with an example
| head 100
assume there are only 4 sourcetypes in the index
| stats first(*) as *
make a list of all fields with their examples (one event, lots of fields)
| fields - date_* tag::*
get rid of some fields we don't care about testing
turn those fields into events
| rename "row 1" as row
get rid of that space
| map maxsearches=20 search="search index=youridx $column$::$row$ | head 1 | eval indexed=\"$column$\" | table indexed"
run up to 20 searches (against the first 20 fields) that go back and search that index for
name::value - if found, output an event and populate the field named "indexed" with the field tested. If
name::value fails, 0 rows are output, so you end up with a list of likely indexed fields.
From my random dataset, it found
timestartpos, but for some reason did not find
source even though they contained had no spaces. Perhaps due to slashes or other special characters that
name::value doesn't like.
name::"value" doesn't seem to work at all. So, YMMV.