I have come across a strange issues with regex extractions - the information I'm trying to extract seems to be only extracting some of the time. I have an automated report which uses a lookup list of orderIDs to find all events that contain the orderIDs in the list and then uses regex extractions to extract key fields. Each id could have up to 16 events related to it - and for each series of events there will be a telephone number and a serviceID located in the xmls. While checking over my report to ensure that the information was correct I discovered that for some series of events, the telephone number was being extracted but the serviceID was not - I have looked at the xmls for theses and confirmed that the field is definitely present. To ensure that my regex was correct I then ran my entire search that is used to create the report but replaced the lookup input with the OrderID and lo and behold the serviceID was extracted and could be found in the interesting fields. I have checked that the formatting of the xmls are the same (they are) but I cannot think what other reason there could be for these extractions working some of the time. Any ideas?
As a further test I have created a lookup list of an ID which is experiencing the extraction problem and an ID for which all information is extracting - when this list is placed into my search all events are returned and all extractions are successful - could this been an issue with the size of my data and Splunk is cutting down my results? The original lookup list has 450 IDs and the query which used it as input returned 473 events
In my experience, this kind of thing happens when there are subtle changes to the log events that cause the regex to "fail" (not match). Consider the Apache unique session ID, it's 19 characters which may include - and @. If you just used a \w, it would break when the ID generated had a - or a @.
When trying to debug this, I run my initial search, and pipe to | where isnull(<target_field>). This filters the result set to those where the regex didn't extract the field properly. Usually it's a small enough set that visual inspection can help spot the issue. Then I might try running a reduced version of my regex (i.e., extracting just one term, instead of several at once), and see if that solves the problem.
In short, if the regex is correct, Splunk doesn't apply it arbitrarily. If the field isn't being extracted, I'd suspect the regex.
You can update your post with samples of each case (match / nomatch) and your regexes, and someone here will be happy to help you.