Splunk Search

Why are field extractions only working for 24 hours or less if the log format hasn't changed?

Splunk Employee
Splunk Employee

I have some logs from a media server that are all formatted in a consistent way, making field extraction creation very easy. I have created the same group of field extractions numerous times because they stop working within 24hrs even without any change in the format of the logs. I have looked at properly tagged events and I have looked at the logs that were not properly tagged and they are identical. There is no reason that I can think of for these field extractions to only work for a short amount of time.

1 Solution

SplunkTrust
SplunkTrust

Looking at your example, you have this in your regex:

(?P<player>\w+\s+\d+\s+\w+)

However, your events have either Roku 3 or Roku 2 XS for the player field - this regex matches the 2 XS, but not the 3 for lack of a third word.

View solution in original post

SplunkTrust
SplunkTrust

Looking at your example, you have this in your regex:

(?P<player>\w+\s+\d+\s+\w+)

However, your events have either Roku 3 or Roku 2 XS for the player field - this regex matches the 2 XS, but not the 3 for lack of a third word.

View solution in original post

Splunk Employee
Splunk Employee

Very good. Thanks. I noticed there were a couple more regex issues as well. Evidently, when you do multiple extractions in one rule with the field extractions tool, they all have to be accurate or none of them work individually.

0 Karma

SplunkTrust
SplunkTrust

That is expected behaviour, a regex can only extract fields if it matches the string.

0 Karma

Splunk Employee
Splunk Employee

Agreed. What interested me is that, if you create a field extraction at once with multiple fields, if any of them do not match, then all do not match. I get it now that I think about it. It is one long regex. At first, I was looking at it like it was individual regex, but it hit me now that it is not. I might have been better off creating one off field extractions instead of doing them all in one extraction rule, but I had never created them all in one rule before and it was something I wanted to try. Thanks again for all your help!

0 Karma

SplunkTrust
SplunkTrust

"One long regex" and "many short regexes" are fundamentally different things.
Depending on your data, existence (or not) of one field may influence the interpretation of other fields, so you may get wrong extractions if you simply chop up the large regex into smaller regexes in some scenarios. In such a case it may be necessary to have several long regexes, where each understands only one way your data works.
In other cases you can have shorter, more modular regexes to avoid overlapping definitions or, as you experienced, subtle errors.

Path Finder

Does the sourcetype name remain the same for events over time? That is, is the sourcename for events that occur today (when extraction is not working) the same as the sourcename of events that occur yesterdau (when extraction is working)

0 Karma

Splunk Employee
Splunk Employee

Yes. The source type and the source name both remain the same.

0 Karma

Path Finder

Hmmmm. The only other suggestion I can make (other than getting a sample of the data and the REGEX you are using and helping debug, which I am happy help with BTW) is to ask about where the extractions are being stored. Specifically, are they in the props.conf of the app in which you are executing the search?

0 Karma

Splunk Employee
Splunk Employee

The field extraction is applied only to the search app. I am playing around with a regex tester online to see if I can figure out why the ones that don't work are messed up.

0 Karma

Path Finder

Do extractions work for events older than 24 hours? Or do they just not work at all for any event, no matter their timestamp?

0 Karma

Splunk Employee
Splunk Employee

They appear to only work for extractions that are older and not recent. I imagine that might be an issue with my regex, but I don't know exactly what is off.

0 Karma

Path Finder

Are you defining extractions against sourcetype or source? Are you able to provide the configuration you have defined in your props.conf?

0 Karma

Splunk Employee
Splunk Employee

pmswatched : EXTRACT-user,title,transcode,releaseyear,contentrating,player,playlength,watchedpercentage,clientip

^(?:[^:\n]*:){3}\s+(?P<user>[^ ]+) Watched: (?P<title>[^\[]+)\[(?P<transcode>\w+)[^ \n]* \[(?P<release_year>[^\]]+)[^ \n]* \[(?P<content_rating>\w+\-\d+)\]\s+\w+\s+(?P<player>\w+\s+\d+\s+\w+)\s+\w+\s+(?P<play_length>\d+\s+[a-z]+\s+)\[(?P<watched_percentage>\d+%)\]\s+(?P<client_ip>.+)
0 Karma

Splunk Employee
Splunk Employee

I built against the source type using the field extraction tool in the web GUI.

0 Karma

Splunk Employee
Splunk Employee

Example of logs that did NOT extract properly:
Mon Aug 17 00:14:14 2015: pvols1979 Watched: CSI: Crime Scene Investigation - Gum Drops - s06e05 [T] [2005] [TV-14] on Roku 3 for 48 minutes [100%] 192.168.1.175
Sat Aug 15 22:21:14 2015: Amy Watched: NCIS: New Orleans - The List - s01e18 [T] [2015] [TV-PG] on Roku 2 XS for 42 minutes [100%] 192.168.1.134

Examples of logs that did extract properly:
Sat Aug 15 21:29:14 2015: Amy Watched: Rizzoli & Isles - Nice to Meet You, Dr. Isles - s06e08 [T] [2015] [TV-14] on Roku 2 XS for 42 minutes [100%] 192.168.1.134
Sat Aug 15 20:44:14 2015: Amy Watched: Rizzoli & Isles - A Bad Seed Grows - s06e07 [T] [2015] [TV-14] on Roku 2 XS for 42 minutes [100%] 192.168.1.134

0 Karma

SplunkTrust
SplunkTrust

You mentioned re-creating the extractions - how, where?

0 Karma

Splunk Employee
Splunk Employee

I used the tool create the field extractions. By recreating, I mean that I delete the extraction and build again. It works for a day and then just stops working.

0 Karma

SplunkTrust
SplunkTrust

Checked _internal for errors?

Splunk Employee
Splunk Employee

I don't see anything in _internal that seems to relate.

0 Karma

SplunkTrust
SplunkTrust

Does the field extraction config disappear?

0 Karma