- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Is my regex or the Interactive Field Extractor regex more efficient? Is there a better option?
Comparing regex strings...
Log format:
Thu 08/07/2014, 6:41:59.97,USERA,TERM1,XXXX-YYYAPP65-5
Thu 08/07/2014, 6:42:00.17,USERA,,XXXX-YYYAPP65-5
Thu 08/07/2014, 6:43:55.11,USERB,TERM2,XXXX-YYYAPP65-6
Thu 08/07/2014, 6:44:25.64,USERC,TERM3,XXXX-YYYAPP65-2
Thu 08/07/2014, 6:44:58.82,USERD,TERM4,XXXX-YYYAPP65-4
Thu 08/07/2014, 6:44:58.92,USERD,,XXXX-YYYAPP65-4
(I realized the Splunk-generated regex in the image above (the Markdown was messing my regex string so I uploaded an image) was omitting extracting empty fields for cname, so cname was not defined for those rows)
Is my regex (see image above) less efficient to extract the fields?
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Perhaps I'm missing something, but wouldn't this be easier to do a REPORT with DELIMS/FIELDS. It seems like a classic example.
props.conf
[my_sourcetype]
REPORT-blah = get_my_fields
transforms.conf
[get_my_fields]
DELIMS = ","
FIELDS = date, time, userid, term, blah, bleh, bluh
Makes sense?
/k
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I've tended to stay away from .conf files, and do most of my parsing/extracting in-line in the query, but I'll give the .conf file a shot in a development instance to see if it will work for my needs. Thanks!
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Haha, it's the answer to a different question! Perhaps the example in the original post is just an example.
Anyway, how does DELIMS work under the hood - is it based on regex? I mean, I could see that
DELIMS = ","
is easily stated as
([^,]*),\s*?
But is it another mechanism that actually does it? Could it be more efficient to write an EXTRACT or REPORT regex, than to rely on DELIMS/FIELDS?
/k
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

better answer!
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Your regex is not less efficient - but it could be problematic. By default, .*
is greedy. I would avoid that by doing this
| rex "(?<day1>.*?)\s(?<date1>.*?),(?<time1>.*?),(?<uname>.*?),(?<cname>.*?),(?<sname>.*?)"
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for the tip on the frugal regex! 🙂
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

I'd say the regex (?<day1>.*) (?<date1>.*),...
is very inefficient. Due to the greediness, the regex will at first match the entire string into day1
and backtrack its way back to the most recent space after the date. Then it'll match the rest from the time to the end into date1
, and backtrack to find a comma. That process will take massively more time and memory than basically stepping linearly through the string once as @lguinn's regex would.
Non-greedy is a simple fix, alternatively you could use more accurate matches than the dot - for example, use \w
or \S
for day1
.
