Splunk Search
Highlighted

Why is index-time field extraction not searchable?

Path Finder

I'm double posting, original issue posted here: http://www.splunk.com/support/forum:SplunkGeneral/4378

When I use double-quotes in my index-time field extractions, the meta-data is not searchable. I've seen this problem on 4.0.11 and 4.1.3.

Sample text:

results=AA,BB,CC CC,DD

Transforms.conf without double-quotes:

REGEX = ^results=(.*?),(.*?),(.*?),(.+)$
FORMAT = key1::$1 key2::$2 key3::$3 key4::$4
WRITE_META = true

Transform.conf with double-quotes:

REGEX = ^results=(.*?),(.*?),(.*?),(.+)$
FORMAT = key1::"$1" key2::"$2" key3::"$3" key4::"$4"
WRITE_META = true

Results:

If you use the first transforms.conf without the double-quotes, there are two problems:

  • The value for key3 (with a space) is not captured correctly. This is in the documentation which says to use double-quotes.

  • The fields extracted on 4.1.3 are incorrect for key4. Instead of having a field "key4" it has "CC key4". I don't recall seeing this behavior in 4.0.x.

However, if you use the second transforms.conf with the double-quotes:

  • The meta-data is not searchable, i.e. search for "key1=AA" fails.


UPDATE 6/15/2010

Here are my conf files so you can replicate this issue. I also have a screenshot below.

inputs.conf:

[monitor:///var/log/test]
disabled = 0
sourcetype = mytest

props.conf:

[mytest]
TRANSFORMS-test = extract-fields

fields.conf:

[key1]
INDEXED = true

[key2]
INDEXED = true

[key3]
INDEXED = true

[key4]
INDEXED = true

transforms.conf:

[extract-fields]
REGEX = ^results=(.*?),(.*?),(.*?),(.+)$
FORMAT = key1::"$1" key2::"$2" key3::"$3" key4::"$4"
WRITE_META = true

screenshot:

In this screenshot, notice that the values are indeed extracted and show up in the search result. However, searching for "key1=AA" (or any other key=value) returns no results.

http://dottom.com/public/images/screenshot_8jd49x4d.png

Tags (2)
Highlighted

Re: Why is index-time field extraction not searchable?

Path Finder

Curious - do you have these keys defined in fields.conf? You shouldn't need the quotes in transforms.conf, I'm unsure what that is supposed to achieve, but I assume it works for you in earlier versions?

What does your props.conf look like?

0 Karma
Highlighted

Re: Why is index-time field extraction not searchable?

Path Finder

Yes, all the fields are defined in fields.conf

You need the double-quotes in transforms.conf when the regular expression backreference captures a value with a space in it.

0 Karma
Highlighted

Re: Why is index-time field extraction not searchable?

Path Finder

As with a lot of Splunk quirks, I don't see this documented (http://www.splunk.com/base/Documentation/latest/Admin/Transformsconf), so I'm not certain you need those quotes, or that it's even valid syntax in the latest version. Space-escaping is mentioned in that document, but only in relations to FIELDS= capturing, which is used alongside auto-kv/delims extraction (which is not what you're doing).

0 Karma
Highlighted

Re: Why is index-time field extraction not searchable?

Path Finder

I should add that you're getting no results for the second conf, which kind of backs that up. The first transforms.conf is valid. If you think there's nothing wrong with your regex, try splitting the capture in to 2 separate transforms and see if you can get it to work that way?

0 Karma
Highlighted

Re: Why is index-time field extraction not searchable?

Super Champion

I agree that the docs can be sparse at times, but this one is documented. See http://www.splunk.com/base/Documentation/latest/Admin/Configureindex-timefieldextraction. So, yes you should be using quotes here.

0 Karma
Highlighted

Re: Why is index-time field extraction not searchable?

Path Finder

I do get search results if I do not put the backreferenced values in quotes. The problem I have is I want to use quotes because that's the correct way to capture the values (with spaces in them), but then I have the other problem I listed in the original post in that those fields extracted at index-time are not searchable.

0 Karma
Highlighted

Re: Why is index-time field extraction not searchable?

Path Finder

Thanks, Lowell. Since it is valid, can this::"$1" syntax (with quotes) appear in the spec for transforms.conf? It'd be good to make it clear in both places on the docs...

0 Karma
Highlighted

Re: Why is index-time field extraction not searchable?

Super Champion

I agree it seems like it should be mentioned both places. Send a note to docs@splunk.com mentioning this. (I don't work for splunk so I can't do anything about it short of emailing this in myself.)

0 Karma
Highlighted

Re: Why is index-time field extraction not searchable?

Super Champion

Is your literal search:

"key1=AA"  

Or, do you mean:

key1=AA

Because the first should fail because such a term (key1) does not exist within your actual raw event (based on your provided sample event). However, the second should work if key1 is setup as an INDEXED_FIELD in fields.conf.

You could try searching for your indexed field explicitly, like so:

 key1::AA

The :: will force 'key1' to be looked up via your indexed field and not using an extracted (search-time) field.

BTW. One useful tool I've found for tracking down index field issues is using the walklex command line tool. You have to drill down into your index's hot bucket and point to one of your .tsidx files. (There's some guess work / trail-n-error involved with finding the right file.) You can search a single .tsidx file for an indexed term (or an indexed field). Here is an example from my system looking for the date_hour indexed field:

walklex 1268486967-1266586961-302021.tsidx 'date_hour::*'

You may be able to use this approach to see if there is an index-level different between how these your indexed fields were stored in your index with previous versions versus now. If this turns out to be some kind of bug in splunk, then this information could be quite valuable.

Another approach to debug indexed fields is to export some data from one of your buckets to a csv file using exporttool like:

exporttool /path/to/your/bucket /tmp/exportfile.csv -csv meta::all

You can then open up the exported file and review the "_meta" column and see how splunk is storing your indexed fields. Again, you can use this to compare before/after your most recent upgrade. (You can use a better search to export just the relevant events by simply replacing "meta::all" with a sourcetype search, for example.)

Out of curiosity, what's the reason why you are using indexed fields instead of extracted fields?

0 Karma