Splunk Search
Highlighted

Why am I unable to extract 2 fields from source at index-time with my current configuration and regex?

Communicator

All my log files are in foldes named:

 c:\blah\something\myapp_test\logs\somelogfile.log

 => app=myapp 
 => env=test

I want to extract two fields from source, to make it easy to just search for "app=myapp env=test"

Since the fields are always there and should be a part of most queries, it seems like a good idea to add them at index time(?)

In etc/system/local I have added:

transforms.conf

[add_app_env]
SOURCE_KEY=source
REGEX=^.*\\\\([a-zA-Z0-9-]+)_([A-Z]+)\\\\.*
FORMAT=app::$1 env::$2
WRITE_META=true

props.conf

[add_app_field]
TRANSFORMS-app = add_app_env

[add_env_field]
TRANSFORMS-env = add_app_env

fields.conf

[add_app_env]
INDEXED=true

But I do not get my app and env fields and I have no idea how to debug this other than trial and error.

I tested my regular expression with a rex extraction - so I think that part works.
I also tried simplifying and just extracting a single field.

0 Karma
Highlighted

Re: Why am I unable to extract 2 fields from source at index-time with my current configuration and regex?

Contributor

It could be just as simple as the REGEX not being formed properly. The regular expression in your example does not work... that is, unless the markup formatting messed something up.

I believe this may work better.

^[cC]\:\\\w+\\\w+\\([a-zA-Z0-9\-]+)_([a-zA-Z0-9\-]+)\\logs\\\w+\.log$
0 Karma
Highlighted

Re: Why am I unable to extract 2 fields from source at index-time with my current configuration and regex?

Communicator

My regex works inside splunk with rex field extraction. Perhaps the double escaped backslashes is not required in transforms.conf?

0 Karma
Highlighted

Re: Why am I unable to extract 2 fields from source at index-time with my current configuration and regex?

Legend

I don't think that creating these fields at index time will improve performance. Instead, I think it makes your configuration more brittle, complex and hard to manage.

You could easily do the same field extraction at search time:

props.conf

[source::*somelogfile.log]
EXTRACT-xyz=^[cC]\:\\\w+\\\w+\\(?<app>[a-zA-Z0-9\-]+)_(?<env>[a-zA-Z0-9\-]+)\\\w+\\somelogfile\.log in source

View solution in original post

Highlighted

Re: Why am I unable to extract 2 fields from source at index-time with my current configuration and regex?

Communicator

Are search-time fields indexed the same way index-time fields are?
If my app and env fields are a part of most queries, it is important that they are indexed once, not discovered at every search.

It is pretty hard to read in the documentation, how the two types differ.
Once extracted what is the difference between search and index-time fields?

0 Karma
Highlighted

Re: Why am I unable to extract 2 fields from source at index-time with my current configuration and regex?

Legend

Search-time fields are extracted at search time. They are more efficient than index-time fields.

It is not a matter of "indexed once" - Splunk works differently than you think. There only rare cases where an index-time field will be faster - in many years working with Splunk, I have yet to see one of these rare cases.

0 Karma
Highlighted

Re: Why am I unable to extract 2 fields from source at index-time with my current configuration and regex?

Communicator

I was able to get the search time extraction working, by adding:

props.conf:
[source::...]
EXTRACT-app,env = ^.*\\(?<app>[a-zA-Z0-9\-]+)_(?<env>[a-zA-Z]+)\\.+ in source

Now the fields are there, but they cannot be searched.
The query:
app=myapp env=test
yields nothing.

I need to add:
index=* app=myapp env=test
to get any results.

0 Karma
Highlighted

Re: Why am I unable to extract 2 fields from source at index-time with my current configuration and regex?

Communicator

Hmm... for some reason I get results from the simple query now. Closing this issue.

0 Karma
Highlighted

Re: Why am I unable to extract 2 fields from source at index-time with my current configuration and regex?

Communicator

I got index time field extraction to work by:

etc/system/local/transforms.conf
[add_app_env]
SOURCE_KEY = MetaData:Source
REGEX = ^.*\\(?<app>[a-zA-Z0-9\-]+)_(?<env>[a-zA-Z]+)\\.+
FORMAT = app::$1 env::$2
WRITE_META = true

etc/system/local/props.conf
[source::...]
TRANSFORMS-appenv = add_app_env

etc/system/local/fields.conf
[app]
INDEXED=true

[env]
INDEXED=true

I am still in doubt if @lguinn is right that a search time field is better.

Even though the field is extracted at index-time, I still don't get results from the query "app=myapp". I have to "index=* app=myapp", which is the same problem I have with the search-time field extraction...

Highlighted

Re: Why am I unable to extract 2 fields from source at index-time with my current configuration and regex?

Community Manager
Community Manager

Hi @lassel

This topic on search-time versus index-time extractions has been covered in documentation and in Splunk Answers throughout the years. Here's a page from documentation explaining this. I also ran a search and these are just a few of previous Answers posts that elaborate on the point brought up by @lguinn.

http://docs.splunk.com/Documentation/Splunk/6.2.2/Indexer/Indextimeversussearchtime
http://answers.splunk.com/answers/151939/how-do-index-and-search-time-field-extractions-differ-and-w...
http://answers.splunk.com/answers/57247/index-time-field-extraction.html
http://answers.splunk.com/answers/842/do-search-time-fields-have-performance-considerations.html#ans...
http://answers.splunk.com/answers/5817/search-time-versus-index-time-field-extractions.html

0 Karma
Speak Up for Splunk Careers!

We want to better understand the impact Splunk experience and expertise has has on individuals' careers, and help highlight the growing demand for Splunk skills.