I've created an extracted field using the field extractor GUI in Splunk Seb. When I created it, there were two values for that field. Now that further logs have been processed, there is a new value for that extracted field.
The issue is that the new value does not appear in the field summary, only the previous two values show up. Also, searches for where the extracted field equals the new value do not return any results. But when I search for the new value as just text, the results are actually there.
Specifically, this is extracting the log message type (INFO,WARN,FATAL) from a custom application we've built. The regular expression generated by the field extractor GUI is:
^[^\[\n]*\[(?P<Event_type>\w+) which is meant to match the tag inside of the brackets from logs that look like this:
2016-12-14 01:02:03 [INFO] Process started. 2016-12-14 01:03:04 [WARN] Some error has happened. 2016-12-14 01:03:44 [INFO] Reticulating splines. 2016-12-14 01:04:05 [FATAL] Process failed!
You can see here that the extracted field is working for two values:
I've even tried to use the field extractor GUI again on one of the results that does have the new value for this field. But it shows that it is already recognized as the extracted field I created:
So why is the new value not appearing in the summary or able to be searched directly using the extracted field?
Please try this regex and it should work all the time whether you use it at searchtime or extraction time:
If you want to search, try:
your query to return events | rex field=_raw "\d\s\[(?<Event_type>[^\]]+)\]\s" | table Event_type
Thanks for the answer. But I'm curious to know why the regex generated by the UI doesn't work as I would expect it to? And for that matter how/why does the regex you've provided fix it? This is something I assume I'll be needing to handle in the future so I'd like a better understanding of where I went wrong. I did look at the documentation for field extraction but I didn't see anything that seemed to call out that this would happen. Links to relevant doco would be very appreciated as well.
Since the regex that gets built in extraction considers the logline that you have selected, hence chances are the regex that got built might not be suitable for some of the cases and
regex that got built is more strict to match the
logline like cases.
Why my regex worked, as it was more generalized regex which fitted all cases you provided:
look for a digit, then a space, then a square opening bracket and capture everything till you see the closing square bracket, look for a closing square bracket and a space to follow
How to check if something like this will happen again?
non-matches tabs during extraction to see if if there are some cases where the regex which got built didn't work out so you can tweak something in there. If the
non-matches are 0 then the regex will work in all the cases which got loaded.
I've updated the regular expression for my existing extracted field to the one you provided and it still only returns the original two values. Is there some setting I'm missing to force it to re-index the values for the field? The settings for the extracted field appears to just be the expression used to define it.
In fact, after changing the expression the field extractor stopped showing me that [FATAL] was an extracted field. So perhaps the original expression works better? I'm quite rusty at regex so I'm not inclined to guess.
Either way the problem seems to be that Splunk isn't figuring out that this extracted field has a new value despite the fact that the regex is valid to match it. I feel like there is something wrong in either the way I've configured the field or my understanding of how extracted fields work. Or maybe it is a bug? But probably too early to call it that...
But when I use your search query and replace your expression with the one generated by the GUI (
source_query | rex field=_raw "^[^\[\n]*\[(?P<Event_type>\w+)" | stats count BY Event_type) it finds all three values but just looking at the extracted field I've built it doesn't show up. So I'm confused. The generated regex seems valid.
The only difference I see in regex101 is that your expression matches all the rows whereas the generated expression only matches the first row and then stops. (It doesn't matter which row is first though. So it will match the [FATAL] log line.)
For the log lines provided above either one of the regex should have worked, unless some other "case" tumbled it over. Hence always try to see
non-matches tab when extracting from field extractor.