Sooo... I've been battling this same thing off and on for the last couple of years. I've learned a few things that might help. First, you have to decide whether you're indexing all extracted fields (not recomended) or if you're doing search time field extractions. What happens to me is that I always test my extractions on a standalone box and it works like a champ and then everything breaks down in our distributed prod / uat environment. Regardless, this might help:
For search-time extractions, most of the relevant props.conf entries will be on the search head. The indexer will only have settings associated with index-timey things (like timestamp, linemerge, line breaker, host, sourcetype, etc -- all the lightweight schema stuff). On the SH, though, you can use a combination of these settings to do the extractions from the header:
CHECK_FOR_HEADER = TRUE
HEADER_FIELD_LINE_NUMBER = <NUMBER> (this one is cranky and unreliable, but sometimes works)
KV_MODE = <CSV, JSON, XML, etc> (this one is also cranky and unreliable, especially with xml)
and if you want to be explicit (recommended in a lot of cases), you can use REPORT
REPORT-name-of-report = name_of_transforms.conf_stanza
Then transforms.conf on the search head will look something like this:
[name_of_transforms.conf_stanza]
DELIMS = ","
FIELDS = field1, field2, field3, etc... (these values match the values in the header IDENTICALLY)
Soooo... while this is pretty recommended for a large-scale distributed environment, it doesn't work well a lot of the time because of the relationship between line breakers and timestamp extractions on the indexers and the search head .conf files. Essentially, you set it all up, you think it should work and then it doesn't (but it did on a standalone)... then troubleshooting sucks.
For index-time extractions, you can use a combination of the following settings:
INDEXED_EXTRACTIONS = <blah>
PREAMBLE_REGEX = <match some pattern in the header> (this actually ignores the first line, but uses it for the field names.)
So if you use PREAMBLE_REGEX, but want search time extractions, you can't (because that line is ignored by the time the search head sees it.).
Another method of troubleshooting, even if you don't plan on indexed extractions, is to turn on
INDEXED_EXTRACTIONS = csv
To see if it's your extractions on the SH or something else that's causing the problem.
And then there's the fishbucket :)... but that's another story... the hits just keep on coming...
... View more