I need to create 'site' field from 'source' field by grabbing last fragment of source, such as:
/var/logs/dir/subdomain1.domainA.com -> subdomain1.domainA.com
/var/logs/dir/domainB.com -> domainB.com
Every search query filters on 'site' extensively, so my idea was to either use index-time extractions or source-time extraction via props/transforms.
Considering that data is coming via universal forwarder to indexer - which approach is the most efficient?
Having search-time field extractions are preferred to indexed time field extractions. More details on below links:
You can do search-time extraction of a field from another field. BUT - you can also do a calculated field! Calculated fields are also search time artifacts, and are preferred over index-time extractions. I strongly advise you to avoid index-time field extractions if you possibly can. They are not more efficient, they are less flexible and they consume more disk space.
Test this eval command. if it works, use it to create a calculated field on the indexer (or search head if you have one):
source=*.com | eval site = replace(source,".*/(.*?)$", "\1")
I am not going to do index-time extractions, but:
You can't do search-time extraction of a field from another field
props.conf DOC says that I can though, like this in my case:
[access_combined] EXTRACT-site = [/\\](?<site>[^/\\]+])$ in source
I thought I'd be able to use it like above especially making it sourcetype-specific. Shouldn't it work?
Your eval certainly will work (not sure why double slashes though), could you elaborate please on the difference between EXTRACT-site and EVAL-site?
Ha! You are right and I had forgotten that you could do this (
EXTRACT-site = [/](?[^/]+])$ in source). I used
// because I can't type. 🙂
I fixed my answer.
I tried EXTRACT-* but it didn't work for some reason. EVAL-* approach did, so i went with it.
Not to forget that we cannot do field aliasing with EVAL--ed fields because aliasing done before EVAL--ing.
Is there a reason why you couldn't just use rex?
source=* | rex field=source ".*/(?<end_of_source>.*)"
I have tons of queries and don't want to inject the same thing into every single query, knowing that it is needed for every each of them.
I ended up putting this into /splunk/etc/apps/MY_APP/local/props.conf:
[access_combined] EVAL-site = replace(source, "^.*?/([^/]+)$", "\1")