I have finally heard back from the Engineering team, and understand that the behaviour is as per design.
In order to explain this, may I request you to please take a look at the following links:
http://docs.splunk.com/Documentation/Splunk/latest/Admin/Segmentersconf
http://docs.splunk.com/Documentation/Splunk/6.2.5/Knowledge/Createandmaintainsearch-timefieldextractionsthroughconfigurationfiles#Create_a_field_from_a_subtoken
Now, let me take an example each of working and non-working case:
Event in Set A:
10.2.0.1 - - [16/Aug/2015:14:52:25 -0600] "GET /%2e/WEB-INF/web.xml HTTP/1.1" 302 388
Event in Set B:
10.2.0.1 20.10.2.2 [21/Aug/2015:03:08:30 -0600] [pid 15599:tid 46921285732672] "GET /nice%20ports%2C/Tri%6Eity.txt%2ebak HTTP/1.0" 404 188 1187
Now, when you search for %2E, event in Set A would get listed, but event in Set B wouldn't. This is because, %2e in Set A is in between delimiters (minor breaker / ), hence becomes a token. In case of Set B, %2e is part of a larger token, hence wouldn't get listed in the results.
Next, if you search for %20, again event in Set B wouldn't get listed, as it is part of larger token. However, if you search for %2C, it will get listed because, it is adjacent to minor breaker ( / ).
In summary, when you search a string, and don't want to make it greedy, then the same needs to be a token, or a sub-token(if configured the way it has been explained in the above link). If your string is a subset of a larger token, then you either need to search for the whole token, or make your search greedy, or extract it as a sub-token.
To make your search non-greedy in this case, you would have to extract the required sub-tokens (%2E, %C1, %1C, %C0, %AE etc) using props.conf into a field, and then search for the field. For example, this is what I did (for %2e and %20 case):
In props.conf, I created the following:
[segment]
EXTRACT-hex1 = (?%2e)
EXTRACT-hex2 = (?%20)
In fields.conf:
[hex_code1]
INDEXED = False
INDEXED_VALUE = False
[hex_code2]
INDEXED = False
INDEXED_VALUE = False
Next, when I search for, say hex_code1=%2e, the results returned is same as the greedy search %2e*. The same with hex_code2=%20.
... View more