I recently discovered the access_combined field definitions don't properly parse the uri fields if it includes a space. I understand the reasoning as spaces are largely regarded as invalid and should be escaped with %20 – however that shouldn't have any bearing on parsing the result in Splunk.
How can I modify the access_combined field definitions via transforms.conf to include spaces in the uri field?
Example event with spaces in the uri:
1.1.1.1 80 - [07/Aug/2019:21:43:37 +0000] "GET /demo_bin/resource.php?command= space and another space and some more spaces in between HTTP/1.1" 400 583 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)" "-" "-"
Here are the details:
| makeresults
| eval _raw = "1.1.1.1 80 - [07/Aug/2019:21:43:37 +0000] \"GET /demo_bin/resource.php?command= space and another space and some more spaces in between HTTP/1.1\" 400 583 \"-\" \"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)\" \"-\" \"-\""
| rename COMMENT AS "This is 'access-extractions' from '/opt/splunk/etc/system/local/transforms.conf'"
| rename COMMENT AS "^[[nspaces:clientip]]\s++[[nspaces:ident]]\s++[[nspaces:user]]\s++[[sbstring:req_time]]\s++[[access-request]]\s++[[nspaces:status]]\s++[[nspaces:bytes]](?:\s++\"(?<referer>[[bc_domain:referer_]]?+[^\"]*+)\"(?:\s++[[qstring:useragent]](?:\s++[[qstring:cookie]])?+)?+)?[[all:other]]"
| rename COMMENT AS "This is 'access-request' from '/opt/splunk/etc/system/local/transforms.conf'"
| rename COMMENT AS "\s*+[[reqstr:method]]?(?:\s++[[bc_uri]](?:\s++[[reqstr:version]])*)?\s*+"
| rename COMMENT AS "This is 'bc_uri' from '/opt/splunk/etc/system/local/transforms.conf'"
| rex "(?<uri>[[bc_domain:uri_]]?+(?<uri_path>[[uri_root]]?[[uri_seg]]*(?<file>[^\s\?/]+)?)(?:\?(?<uri_query>[^\s]*))?)"
| rex mode=sed "s/.*\"\w+\s+//"
| rename COMMENT AS "Let's modify it to fix it..."
| rex "(?<uri>[[bc_domain:uri_]]?+(?<uri_path>[[uri_root]]?[[uri_seg]]*(?<file>[^\s\?/]+)?)(?:\?(?<uri_query>[^\s]*(?:\s+[^\"]+))?)?)"
So you need to create an updated definition for bc_uri along the lines of what I did in the last line above and put it someplace where the transforms.conf will have global scope preferences.
Here are the details:
| makeresults
| eval _raw = "1.1.1.1 80 - [07/Aug/2019:21:43:37 +0000] \"GET /demo_bin/resource.php?command= space and another space and some more spaces in between HTTP/1.1\" 400 583 \"-\" \"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)\" \"-\" \"-\""
| rename COMMENT AS "This is 'access-extractions' from '/opt/splunk/etc/system/local/transforms.conf'"
| rename COMMENT AS "^[[nspaces:clientip]]\s++[[nspaces:ident]]\s++[[nspaces:user]]\s++[[sbstring:req_time]]\s++[[access-request]]\s++[[nspaces:status]]\s++[[nspaces:bytes]](?:\s++\"(?<referer>[[bc_domain:referer_]]?+[^\"]*+)\"(?:\s++[[qstring:useragent]](?:\s++[[qstring:cookie]])?+)?+)?[[all:other]]"
| rename COMMENT AS "This is 'access-request' from '/opt/splunk/etc/system/local/transforms.conf'"
| rename COMMENT AS "\s*+[[reqstr:method]]?(?:\s++[[bc_uri]](?:\s++[[reqstr:version]])*)?\s*+"
| rename COMMENT AS "This is 'bc_uri' from '/opt/splunk/etc/system/local/transforms.conf'"
| rex "(?<uri>[[bc_domain:uri_]]?+(?<uri_path>[[uri_root]]?[[uri_seg]]*(?<file>[^\s\?/]+)?)(?:\?(?<uri_query>[^\s]*))?)"
| rex mode=sed "s/.*\"\w+\s+//"
| rename COMMENT AS "Let's modify it to fix it..."
| rex "(?<uri>[[bc_domain:uri_]]?+(?<uri_path>[[uri_root]]?[[uri_seg]]*(?<file>[^\s\?/]+)?)(?:\?(?<uri_query>[^\s]*(?:\s+[^\"]+))?)?)"
So you need to create an updated definition for bc_uri along the lines of what I did in the last line above and put it someplace where the transforms.conf will have global scope preferences.
It is not common for spaces to even exist so you would have to post some sample raw events if you'd like anybody to help.
Updated question to include example event