All my log files are in foldes named:
c:\blah\something\myapp_test\logs\somelogfile.log => app=myapp => env=test
I want to extract two fields from source, to make it easy to just search for "app=myapp env=test"
Since the fields are always there and should be a part of most queries, it seems like a good idea to add them at index time(?)
In etc/system/local I have added:
[add_app_env] SOURCE_KEY=source REGEX=^.*\\\\([a-zA-Z0-9-]+)_([A-Z]+)\\\\.* FORMAT=app::$1 env::$2 WRITE_META=true
[add_app_field] TRANSFORMS-app = add_app_env [add_env_field] TRANSFORMS-env = add_app_env
But I do not get my app and env fields and I have no idea how to debug this other than trial and error.
I tested my regular expression with a rex extraction - so I think that part works.
I also tried simplifying and just extracting a single field.
It could be just as simple as the REGEX not being formed properly. The regular expression in your example does not work... that is, unless the markup formatting messed something up.
I believe this may work better.
My regex works inside splunk with rex field extraction. Perhaps the double escaped backslashes is not required in transforms.conf?
I don't think that creating these fields at index time will improve performance. Instead, I think it makes your configuration more brittle, complex and hard to manage.
You could easily do the same field extraction at search time:
[source::*somelogfile.log] EXTRACT-xyz=^[cC]\:\\\w+\\\w+\\(?<app>[a-zA-Z0-9\-]+)_(?<env>[a-zA-Z0-9\-]+)\\\w+\\somelogfile\.log in source
Are search-time fields indexed the same way index-time fields are?
If my app and env fields are a part of most queries, it is important that they are indexed once, not discovered at every search.
It is pretty hard to read in the documentation, how the two types differ.
Once extracted what is the difference between search and index-time fields?
Search-time fields are extracted at search time. They are more efficient than index-time fields.
It is not a matter of "indexed once" - Splunk works differently than you think. There only rare cases where an index-time field will be faster - in many years working with Splunk, I have yet to see one of these rare cases.
I was able to get the search time extraction working, by adding:
props.conf: [source::...] EXTRACT-app,env = ^.*\\(?<app>[a-zA-Z0-9\-]+)_(?<env>[a-zA-Z]+)\\.+ in source
Now the fields are there, but they cannot be searched.
I need to add:
index=* app=myapp env=test
to get any results.
Hmm... for some reason I get results from the simple query now. Closing this issue.
I got index time field extraction to work by:
etc/system/local/transforms.conf [add_app_env] SOURCE_KEY = MetaData:Source REGEX = ^.*\\(?<app>[a-zA-Z0-9\-]+)_(?<env>[a-zA-Z]+)\\.+ FORMAT = app::$1 env::$2 WRITE_META = true etc/system/local/props.conf [source::...] TRANSFORMS-appenv = add_app_env etc/system/local/fields.conf [app] INDEXED=true [env] INDEXED=true
I am still in doubt if @lguinn is right that a search time field is better.
Even though the field is extracted at index-time, I still don't get results from the query "app=myapp". I have to "index=* app=myapp", which is the same problem I have with the search-time field extraction...
This topic on search-time versus index-time extractions has been covered in documentation and in Splunk Answers throughout the years. Here's a page from documentation explaining this. I also ran a search and these are just a few of previous Answers posts that elaborate on the point brought up by @lguinn.