Hi,
Some examples of our urls. And we need data in the below format (Needed format). You can see that the needed format is not following standard format. From First one we need Company/Static and from 2nd one just need ecatalog. Reg ex expoerts, How can we write regex for this kind of situation..
/Company/static/fos_adm.html;
/Company/benjamin-moore/ecatalog/N-1z0epd0?cm_mmc=DM-_-Brand-_-BenjaminMoore-_-Redirect;
/Company/static.jsp?page=fos_ads.html;
/wwg/start.shtml?CM&1=SF2=MMEV-4005=016=X5=002;
Needed format
Company/static 805
ecatalog 60
static.jsp 1070
search.shtml 26
start.shtml 31
This is the continuation. i could not put it in a single comment.
^/([^/]\w+/static[^.])
starts with / and not / and then one or more word characters then /static followed by not \ and any character
I am guessing that expression is for capturing the below line?
/Company/static.jsp?page=fos_ads.html;
if so, after / we have Company which are word characters. So why do you also have ^/
Also after /static we have .jsp?page=fos_ads.html;
but in the expression we only have /static[^.]
This means after /static not \ and then any character. But we have more characters right?
Hi, if you don't mind i will explain what i understood. can you pls correct me?
/(([\w\d_-&]+.(?:jsp|html|shtml|xml|mht))).{0,};$
one or more characters of any word, digit, _ and - followed by . and then either of jsp or html, etc and then any character 0 to infinite times.
question for this expression is: why can't we use .{0,} instead of [\w\d_-&]+
Also what is the significance of ()
(ecatalog).{0,};
Not sure why you have () around ecatalog. looks like any characters should be preceeded by ecatalog?
. (period) matches any character expect for line break. {0,} says match pervious regex in this case the the period zero to infinate number of times that occurre before ; semicolon. $ start match at of line. $ could be ommitted if your trying to match text in the middle of an event. Given your orginal example I chose to use the end of line anchor.
Thanks for your help on this. Trying to understand what you wrote here..why do you have .{0,};$.Sorry not very familiar with regex.
will meet your example.
/(([\w\d\_\-&]+\.(?:jsp|html|shtml|xml|mht))).{0,};$|(ecatalog).{0,};$|^/([^/]\w+/static[^\.])
Sorry Ayn and Kristian for not being clear. Looks like these are some vanity urls which will be called as a result of some redirect rules. Kristian as you said, they want to get a count of part of these urls. What i observed is, there is no common pattern to write splunk query. But may be you can think of some other way?
They need something like the below format.
part of uri count start.shtml 608 items/count 2000 ecatalog 500
i tried something like this but it is not giving me aggregated numbers..
sourcetype=accesscombinedwcookie host=prgwc* (uripath=/company/static.jsp OR uripath=/wwg/start.shtml OR uri_path=/catalog/* OR uripath=/company/items/*) | stats count by uripath
Here are some examples from Logs.
Set of examples for getting the count for start.shtml
70.89.56.230 - - [06/May/2013:10:06:38 -0500] "GET /company/wwg/start.shtml HTTP/1.1" 200 17313 914097 "-" "Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.31 (KHTML, like Gecko) Chrome/26.0.1410.64 Safari/537.31" "-" "GIS" "Yes" sourcetype=accesscombinedwcookie Options| source=/local/www/gcom/apache2/logs/access_log.20130506 Options| host=prgwc03 BVProdWeb Options
204.153.104.254 - - [06/May/2013:10:06:38 -0500] "GET /company/wwg/start.shtml HTTP/1.1" 200 17368 5560983 "-" "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0)" "-" "GIS" "Yes"
set of examples for getting company/items count
38.101.184.210 - - [06/May/2013:10:06:24 -0500] "GET /company/items/1DYE3?cmsp=IO--Home--TPSELL&cmvc=HPTSZ3 HTTP/1.1" 301 20 22561 "http://www.company.com/company/wwg/homepage.jsp?time=Mon+May+06+10%3A05%3A59+CDT+2013" "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; WOW64; Trident/5.0)" "cchiadfjkiijjkkcefeceemdfnfdfoj.0:@@@@0483729929.1367852758@@@@" "GIS" "Yes" sourcetype=accesscombinedwcookie Options| source=/local/www/gcom/apache2/logs/accesslog.20130506 Options| BVSessionID=1367852758 Options| host=prgwc04 BVProdWeb
146.217.200.214 - - [06/May/2013:10:06:27 -0500] "GET /company/items/6FXH4?cmsp=IO--Home--TPSELL&cmvc=HPTSZ3 HTTP/1.1" 301 20 13936 "http://www.company.com/company/wwg/start.shtml?firstTime=no&BVUseBVCookie=no&time=Mon+May+06+10%3A05..." "Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.31 (KHTML, like Gecko) Chrome/26.0.1410.64 Safari/537.31" "ccfjadfjkkhieimcefeceemdfnfdglf.0:@@@@1320788453.1367852718@@@@" "GIS" "-" sourcetype=accesscombinedwcookie Options| source=/local/www/gcom/apache2/logs/accesslog.20130506 Options| BV_SessionID=1367852718 Options| host=prgwc03 BVProdWeb
Set of examples for getting ecatalog count
70.192.205.10 - - [06/May/2013:10:06:38 -0500] "GET /company/digital-multimeters/electrical-power-testing/test-instruments/ecatalog/N-b98Z1z0r51d?Ndr=basedimid10071&itemsPerPage=60&sst=subset HTTP/1.1" 200 35288 992477 "http://www.company.com/company/digital-multimeters/electrical-power-testing/test-instruments/ecatalo..." "Mozilla/5.0 (Windows NT 5.1; rv:20.0) Gecko/20100101 Firefox/20.0" "-" "-" "-"
24.173.80.206 - - [06/May/2013:10:06:38 -0500] "GET /company/general-purpose-ac-motors/motors/ecatalog/N-ls1?Ndr=basedimid10071&sst=subset HTTP/1.1" 200 32103 1008898 "http://www.company.com/company/motors/ecatalog/N-bii" "Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.1; WOW64; Trident/6.0)" "-" "-" "-"
I could not send my info as a comment (as there are some restrictions on the characters), so i have added as answer to workaround this.
isn't this four lines of log? Or two? I guess that you want a count of some part of the URL, but as Ayn says, please be more specific.
Also, are these the full events? There are no timestamps...
I've no idea what your desired result is. What's the common denominator for the matches? What are these numbers you're listing at the end? Please explain more clearly.