Splunk Search

Different format data

xvxt006
Contributor

Hi,

Some examples of our urls. And we need data in the below format (Needed format). You can see that the needed format is not following standard format. From First one we need Company/Static and from 2nd one just need ecatalog. Reg ex expoerts, How can we write regex for this kind of situation..

/Company/static/fos_adm.html; 
/Company/benjamin-moore/ecatalog/N-1z0epd0?cm_mmc=DM-_-Brand-_-BenjaminMoore-_-Redirect;  
/Company/static.jsp?page=fos_ads.html; 
/wwg/start.shtml?CM&1=SF2=MMEV-4005=016=X5=002; 

Needed format

Company/static  805
ecatalog    60
static.jsp  1070
search.shtml    26
start.shtml 31
0 Karma

xvxt006
Contributor

This is the continuation. i could not put it in a single comment.

^/([^/]\w+/static[^.])

starts with / and not / and then one or more word characters then /static followed by not \ and any character

I am guessing that expression is for capturing the below line?
/Company/static.jsp?page=fos_ads.html;

if so, after / we have Company which are word characters. So why do you also have ^/
Also after /static we have .jsp?page=fos_ads.html;
but in the expression we only have /static[^.]
This means after /static not \ and then any character. But we have more characters right?

0 Karma

xvxt006
Contributor

Hi, if you don't mind i will explain what i understood. can you pls correct me?

/(([\w\d_-&]+.(?:jsp|html|shtml|xml|mht))).{0,};$

one or more characters of any word, digit, _ and - followed by . and then either of jsp or html, etc and then any character 0 to infinite times.

question for this expression is: why can't we use .{0,} instead of [\w\d_-&]+
Also what is the significance of ()

(ecatalog).{0,};

Not sure why you have () around ecatalog. looks like any characters should be preceeded by ecatalog?

0 Karma

bmacias84
Champion

. (period) matches any character expect for line break. {0,} says match pervious regex in this case the the period zero to infinate number of times that occurre before ; semicolon. $ start match at of line. $ could be ommitted if your trying to match text in the middle of an event. Given your orginal example I chose to use the end of line anchor.

xvxt006
Contributor

Thanks for your help on this. Trying to understand what you wrote here..why do you have .{0,};$.Sorry not very familiar with regex.

0 Karma

bmacias84
Champion

will meet your example.
/(([\w\d\_\-&]+\.(?:jsp|html|shtml|xml|mht))).{0,};$|(ecatalog).{0,};$|^/([^/]\w+/static[^\.])

xvxt006
Contributor

Sorry Ayn and Kristian for not being clear. Looks like these are some vanity urls which will be called as a result of some redirect rules. Kristian as you said, they want to get a count of part of these urls. What i observed is, there is no common pattern to write splunk query. But may be you can think of some other way?

They need something like the below format.

part of uri count start.shtml 608 items/count 2000 ecatalog 500

i tried something like this but it is not giving me aggregated numbers..

sourcetype=accesscombinedwcookie host=prgwc* (uripath=/company/static.jsp OR uripath=/wwg/start.shtml OR uri_path=/catalog/* OR uripath=/company/items/*) | stats count by uripath

Here are some examples from Logs.

Set of examples for getting the count for start.shtml

70.89.56.230 - - [06/May/2013:10:06:38 -0500] "GET /company/wwg/start.shtml HTTP/1.1" 200 17313 914097 "-" "Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.31 (KHTML, like Gecko) Chrome/26.0.1410.64 Safari/537.31" "-" "GIS" "Yes" sourcetype=accesscombinedwcookie Options| source=/local/www/gcom/apache2/logs/access_log.20130506 Options| host=prgwc03 BVProdWeb Options

204.153.104.254 - - [06/May/2013:10:06:38 -0500] "GET /company/wwg/start.shtml HTTP/1.1" 200 17368 5560983 "-" "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0)" "-" "GIS" "Yes"

set of examples for getting company/items count

38.101.184.210 - - [06/May/2013:10:06:24 -0500] "GET /company/items/1DYE3?cmsp=IO--Home--TPSELL&cmvc=HPTSZ3 HTTP/1.1" 301 20 22561 "http://www.company.com/company/wwg/homepage.jsp?time=Mon+May+06+10%3A05%3A59+CDT+2013" "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; WOW64; Trident/5.0)" "cchiadfjkiijjkkcefeceemdfnfdfoj.0:@@@@0483729929.1367852758@@@@" "GIS" "Yes" sourcetype=accesscombinedwcookie Options| source=/local/www/gcom/apache2/logs/accesslog.20130506 Options| BVSessionID=1367852758 Options| host=prgwc04 BVProdWeb

146.217.200.214 - - [06/May/2013:10:06:27 -0500] "GET /company/items/6FXH4?cmsp=IO--Home--TPSELL&cmvc=HPTSZ3 HTTP/1.1" 301 20 13936 "http://www.company.com/company/wwg/start.shtml?firstTime=no&BVUseBVCookie=no&time=Mon+May+06+10%3A05..." "Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.31 (KHTML, like Gecko) Chrome/26.0.1410.64 Safari/537.31" "ccfjadfjkkhieimcefeceemdfnfdglf.0:@@@@1320788453.1367852718@@@@" "GIS" "-" sourcetype=accesscombinedwcookie Options| source=/local/www/gcom/apache2/logs/accesslog.20130506 Options| BV_SessionID=1367852718 Options| host=prgwc03 BVProdWeb

Set of examples for getting ecatalog count

70.192.205.10 - - [06/May/2013:10:06:38 -0500] "GET /company/digital-multimeters/electrical-power-testing/test-instruments/ecatalog/N-b98Z1z0r51d?Ndr=basedimid10071&itemsPerPage=60&sst=subset HTTP/1.1" 200 35288 992477 "http://www.company.com/company/digital-multimeters/electrical-power-testing/test-instruments/ecatalo..." "Mozilla/5.0 (Windows NT 5.1; rv:20.0) Gecko/20100101 Firefox/20.0" "-" "-" "-"

24.173.80.206 - - [06/May/2013:10:06:38 -0500] "GET /company/general-purpose-ac-motors/motors/ecatalog/N-ls1?Ndr=basedimid10071&sst=subset HTTP/1.1" 200 32103 1008898 "http://www.company.com/company/motors/ecatalog/N-bii" "Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.1; WOW64; Trident/6.0)" "-" "-" "-"

0 Karma

xvxt006
Contributor

I could not send my info as a comment (as there are some restrictions on the characters), so i have added as answer to workaround this.

0 Karma

kristian_kolb
Ultra Champion

isn't this four lines of log? Or two? I guess that you want a count of some part of the URL, but as Ayn says, please be more specific.

Also, are these the full events? There are no timestamps...

0 Karma

Ayn
Legend

I've no idea what your desired result is. What's the common denominator for the matches? What are these numbers you're listing at the end? Please explain more clearly.

0 Karma
Get Updates on the Splunk Community!

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...

Introducing the 2024 Splunk MVPs!

We are excited to announce the 2024 cohort of the Splunk MVP program. Splunk MVPs are passionate members of ...

Splunk Custom Visualizations App End of Life

The Splunk Custom Visualizations apps End of Life for SimpleXML will reach end of support on Dec 21, 2024, ...