Splunk Search

rex vs. extraction field

ryastrebov
Communicator

Hello!
Which method is faster?
It seemed to me that the rex method is very slow for a large number of events.

Tags (3)
1 Solution

BobM
Builder

The time they take should be similar. The important thing is with REX it is only this search that takes this time. With an extracted field every search with that sourcetype returned has to do the regular expression.

You may be able to do a few things to speed either of them up though. Avoid case insensitive searching unless you have to, don't start or end your search with .* in fact avoid wild cards wherever you can. If you know the text being returned use it. Try to make the Rex non greedy. Starting with \b to denote start at a word boundary and not mid word can also help.

Bob

View solution in original post

stribog
Explorer

you can find exact time for each operation, using rex command or parsing with props.conf/transforms.conf
first of all run query with rex command only, when your props and transforms are empty for field extractions
second time run query when you have parsing in props/transforms files
for each query find job statistics, and you will see wich operation takes more time.

0 Karma

ryastrebov
Communicator

Thank you for your answer!

0 Karma

BobM
Builder

For your specific example, you should be able to use the following. I am assuming there are no 4 Level Domains or longer.

\bquery: (?<fulldomain>(?<2_LevelDomain>[^.s]+.[^.s]+)|(?<3_LevelDomain>[^.s]+.[^.s]+.[^.s]+))$

0 Karma

ryastrebov
Communicator

No, 4 Level Domains or longer is presented in log.

Example from log:

Mar 7 11:58:10 219.16.134.140 Mar 7 11:59:38 named[18535]: [ID 873579 local0.info] 07-Mar-2013 11:59:38.249 queries: info: client 123.121.123.121#1028 (e3191.c.akamaiedge.net): query: e3191.c.akamaiedge.net IN A + (219.16.134.140)

I need to extract this field: e3191.c.akamaiedge.net (FullDomain), c.akamaiedge.net (3LevelDomain), akamaiedge.net (2LevelDomain). It can be 3 different Extraction Fields expression. Thank you!

0 Karma

BobM
Builder

The time they take should be similar. The important thing is with REX it is only this search that takes this time. With an extracted field every search with that sourcetype returned has to do the regular expression.

You may be able to do a few things to speed either of them up though. Avoid case insensitive searching unless you have to, don't start or end your search with .* in fact avoid wild cards wherever you can. If you know the text being returned use it. Try to make the Rex non greedy. Starting with \b to denote start at a word boundary and not mid word can also help.

Bob

ryastrebov
Communicator

Thank you, Bob!

I analyze DNS-log. By Extraction Field I extract full domain name. Regular expression for this is:
(?i) query: (?P[^ ]+)

By rex I extract 2 and 3 level domain from FullDomain field. Rex is
rex field=FullDomain "(?<2_LevelDomain>[^.\s]+.[^.\s]+)$"
and
rex field=FullDomain "(?<3_LevelDomain>[^.\s]+.[^.\s]+.[^.\s]+)$"

I'm new in rex and I can't to create regular expression for extract 2LevelDomain and 3LevelDomain directly in Extraction Field, unfortunately. Can you help me, please?

0 Karma

brettcave
Builder

@BobM - My understanding is that there was some performance benefit to field extractions over rex, especially in distributed deployments, as there is a bit more caching that happens with field extractions. Using a rex extracts fields on every search instead of using matches for fields.... is this the case at all?

0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...