Hi,
I have a tomcat access log which contains urls like
url=/find.do?from-id=549499&q-out=2019-02-20&q-room-0-adults=2&q-rooms=1&q-check-in=2019-02-18&q-room-0-children=0&hid=116903
I want to extract all the parameters from it, like from-id ,q-out etc.
the query I am using is like
index=my_site source=sa-*tomcat_access.log url | rex field=url "[search.do]\?&=([^&]+)" | stats count by url_parameter
its printing the first value, but not all the fields. Please help me with the query.
@vineethvnair0 ,
Try adding max_match=0
to repeat the match
index=my_site source=*sa-tomcat_access.log url | rex max_match=0 field=url "[search.do][\?\&](?[^=]+)=([^&]+)" | stats count by url_parameter
Test
|makeresults|eval url="url=/find.do?from-id=549499&q-out=2019-02-20&q-room-0-adults=2&q-rooms=1&q-check-in=2019-02-18&q-room-0-children=0&hid=116903"
|rex field=url max_match=0 "[\?\&](?<params>[^=]+)=(?<values>[^&]+)"
@vineethvnair0 ,
Try adding max_match=0
to repeat the match
index=my_site source=*sa-tomcat_access.log url | rex max_match=0 field=url "[search.do][\?\&](?[^=]+)=([^&]+)" | stats count by url_parameter
Test
|makeresults|eval url="url=/find.do?from-id=549499&q-out=2019-02-20&q-room-0-adults=2&q-rooms=1&q-check-in=2019-02-18&q-room-0-children=0&hid=116903"
|rex field=url max_match=0 "[\?\&](?<params>[^=]+)=(?<values>[^&]+)"
@renjith.nair
its working fine with the test you give, but not working when I query on the original log, I suspect the issue is because the url element is not correctly extracted. Please find a full sample event below
domain=xxx.com [24/Jan/2019:07:04:45 +0000] remote_host=1.14.1.17 ajax=- http_method=GET url=/find/listings.json?q-locale=en_GB&mvariant=495.0%2C4212.1%2C790.1%2C4192.1%2C2313.3%2C5001.0%2C3309.0%2C7015.0%2C5167.0%2C4440.0&q-mvts=495.0%2C4212.1%2C790.1%2C4192.1%2C2313.3%2C5001.0%2C3309.0%2C7015.0%2C5167.0%2C4440.0&q-logged-in=false&q-posa=DOT_UK&q-secure=true&destination-id=726784&q-client-ip=10.187.77.115&q-channel=WEB_DESKTOP&q-hermes-user-guid=a6acceab-e2e1-43c7-9111-3840cb09bab4&q-brand-id=xxx.com&include-filters=true&q-native-app=iPhone&q-room-0-adults=2&q-rooms=1&so=STAR_RATING_HIGHEST_FIRST&q-client-id=SRLE&lids=1658484 redirect=http://test.com/find/listings.json?q-locale=en_GB&q-mvts=495.0%2C4212.1%2C790.1%2C4192.1%2C2313.3%2C... statuscode=302 duration_ms=8 bytes_sent=- referer=- user_agent=comappdefault sessid=- edgescape=- guid=- req_guid=ShoppingApp-SA.2019.1.7379;a4c32120-e144-4c2e-a90b-cceec77e676a;10 nativeApp=- X-Forwarded-Host=- X-Forwarded-Server=- X-NS-Forwarded-Server=- SiteSpectEngine=-
Can you please help to get the query parameters from this event?
@vineethvnair0 , since all these params are key=value pair, splunk should have extracted them automatically by default. Do you see these as fields in the events ? If not , is url is a field or do we need to extract that as well?
I have tried loading your sample event and still it works with the above regex
@renjith.nair The query parameters are not listed in splunk, the url is listed but showing only value till
find/listings.json?q-locale=en_GB
So it's extracting key value pair but not sure why it's showing up the other fields - there might be other configuration which override KM_MODE.
Nevertheless, this should also work
index=yourindex
|rex field=_raw "url=(?<URL>.+)"
|rex field=URL max_match=0 "[\?\&](?<params>[^=]+)=(?<values>[^&]+)"
|table params,values
@renjith.nair how can i avoid duplicates, I tried using dedup params before and after the
| table params
But still the field are coming like
q-locale
q-logged-in
q-posa
q-secure
destination-id
q-client-ip
q-channel
q-locale
q-logged-in
q-posa
q-secure
destination-id
q-client-ip
q-channel
It might be as multivalue, so either you could use |stats count by params|fields - count
or eval params=mvdedup(params)
@renjith.nair the stats count one some params are missing, even mvdedup is not working its coming as multiple lists, thats why the duplicates are not removed, is there any way to avoid that
@renjith.nair thanks its working
A small correction for getting the url though
rex field=_raw "url=(?[^ ]+)"