I figured it out. Nevermind. It was based on the first post - but had to redo the regex as the uri didn't, in this case, start with http://
I figured it out. Nevermind. It was based on the first post - but had to redo the regex as the uri didn't, in this case, start with http://
If you already have the field extracted, then you can use eval or rex to create a new field to extract the first part of the URL with something like (using eval):
eval mainpart=replace(origurl,"(.*)[?].*","\1")
Where origurl is the already extracted URL field, and ? is the ? in the URL for separating the Parameters from the rest of the URL. That will enable you to have more than .html at the end of the URL (like jpeg, js css, etc). The REX would be like the example already given by aljohnson_splunk. If your logs don't include the http:// (as many apache log files do), then your rex would need to allow for finding the URL differently from his example.
Things that will help us help you:
It sorta sounds like you want to use the rex
command.
E.g.
| rex field=url_field "http://(?<url_path>.+html)"
| stats count by url_path
For this particular goal, I would usually make the .+
be ungreedy with .+?
e.g.
| rex field=url_field "http://(?<url_path>.+?html)"