I'm double posting, original issue posted here: http://www.splunk.com/support/forum:SplunkGeneral/4378
When I use double-quotes in my index-time field extractions, the meta-data is not searchable. I've seen this problem on 4.0.11 and 4.1.3.
Transforms.conf without double-quotes:
REGEX = ^results=(.*?),(.*?),(.*?),(.+)$ FORMAT = key1::$1 key2::$2 key3::$3 key4::$4 WRITE_META = true
Transform.conf with double-quotes:
REGEX = ^results=(.*?),(.*?),(.*?),(.+)$ FORMAT = key1::"$1" key2::"$2" key3::"$3" key4::"$4" WRITE_META = true
If you use the first transforms.conf without the double-quotes, there are two problems:
The value for key3 (with a space) is not captured correctly. This is in the documentation which says to use double-quotes.
The fields extracted on 4.1.3 are incorrect for key4. Instead of having a field "key4" it has "CC key4". I don't recall seeing this behavior in 4.0.x.
However, if you use the second transforms.conf with the double-quotes:
Here are my conf files so you can replicate this issue. I also have a screenshot below.
[monitor:///var/log/test] disabled = 0 sourcetype = mytest
[mytest] TRANSFORMS-test = extract-fields
[key1] INDEXED = true [key2] INDEXED = true [key3] INDEXED = true [key4] INDEXED = true
[extract-fields] REGEX = ^results=(.*?),(.*?),(.*?),(.+)$ FORMAT = key1::"$1" key2::"$2" key3::"$3" key4::"$4" WRITE_META = true
In this screenshot, notice that the values are indeed extracted and show up in the search result. However, searching for "key1=AA" (or any other key=value) returns no results.
Curious - do you have these keys defined in fields.conf? You shouldn't need the quotes in transforms.conf, I'm unsure what that is supposed to achieve, but I assume it works for you in earlier versions?
What does your props.conf look like?
Yes, all the fields are defined in fields.conf
You need the double-quotes in transforms.conf when the regular expression backreference captures a value with a space in it.
As with a lot of Splunk quirks, I don't see this documented (http://www.splunk.com/base/Documentation/latest/Admin/Transformsconf), so I'm not certain you need those quotes, or that it's even valid syntax in the latest version. Space-escaping is mentioned in that document, but only in relations to FIELDS= capturing, which is used alongside auto-kv/delims extraction (which is not what you're doing).
I should add that you're getting no results for the second conf, which kind of backs that up. The first transforms.conf is valid. If you think there's nothing wrong with your regex, try splitting the capture in to 2 separate transforms and see if you can get it to work that way?
I agree that the docs can be sparse at times, but this one is documented. See http://www.splunk.com/base/Documentation/latest/Admin/Configureindex-timefieldextraction. So, yes you should be using quotes here.
I do get search results if I do not put the backreferenced values in quotes. The problem I have is I want to use quotes because that's the correct way to capture the values (with spaces in them), but then I have the other problem I listed in the original post in that those fields extracted at index-time are not searchable.
Thanks, Lowell. Since it is valid, can this::"$1" syntax (with quotes) appear in the spec for transforms.conf? It'd be good to make it clear in both places on the docs...
Is your literal search:
Or, do you mean:
Because the first should fail because such a term (
key1) does not exist within your actual raw event (based on your provided sample event). However, the second should work if
key1 is setup as an
You could try searching for your indexed field explicitly, like so:
:: will force 'key1' to be looked up via your indexed field and not using an extracted (search-time) field.
BTW. One useful tool I've found for tracking down index field issues is using the
walklex command line tool. You have to drill down into your index's hot bucket and point to one of your
.tsidx files. (There's some guess work / trail-n-error involved with finding the right file.) You can search a single
.tsidx file for an indexed term (or an indexed field). Here is an example from my system looking for the
date_hour indexed field:
walklex 1268486967-1266586961-302021.tsidx 'date_hour::*'
You may be able to use this approach to see if there is an index-level different between how these your indexed fields were stored in your index with previous versions versus now. If this turns out to be some kind of bug in splunk, then this information could be quite valuable.
Another approach to debug indexed fields is to export some data from one of your buckets to a
csv file using
exporttool /path/to/your/bucket /tmp/exportfile.csv -csv meta::all
You can then open up the exported file and review the "_meta" column and see how splunk is storing your indexed fields. Again, you can use this to compare before/after your most recent upgrade. (You can use a better search to export just the relevant events by simply replacing "meta::all" with a sourcetype search, for example.)
Out of curiosity, what's the reason why you are using indexed fields instead of extracted fields?