For example, I would like to group all the following URLs under google:
docs.google.com,
maps.google.com,
www.google.com,
...
(may be it is *google*)
Is there a way to do it such that it will show results with pre-defined domains?
I would much appreciate if such pre-defined rules already exist some where.
Thank you.
Well, I assume that you have an extracted field for the URL (or URI), correct?
That field would contain just a little too much information for your sorting/grouping purposes, right, e.g.
http://www.google.com/search?q=blah
https://secure.bank.co.uk/login
From that field you can extract the domain part (google, bank) as a new field with a regex, either inline in the search, or more 'permanent' by editing a config file (or using the IFX).
Inline, you could have a search that looks something like;
sourcetype=your_bluecoat_sourcetype | rex field=URL "https?://[^\.]+\.(?XXXXXXXXX[^\.]+)\." | stats c by domain
Aaargh - something seems to be wrong - I just cannot get HTML-specific characters too work. The XXXX should be replaced with the word "domain", enclosed in angle brackets (no quotes).
The final part after the | creates a table counting events by the newly extracted 'domain' field.
Hope this helps,
Kristian
Well, I assume that you have an extracted field for the URL (or URI), correct?
That field would contain just a little too much information for your sorting/grouping purposes, right, e.g.
http://www.google.com/search?q=blah
https://secure.bank.co.uk/login
From that field you can extract the domain part (google, bank) as a new field with a regex, either inline in the search, or more 'permanent' by editing a config file (or using the IFX).
Inline, you could have a search that looks something like;
sourcetype=your_bluecoat_sourcetype | rex field=URL "https?://[^\.]+\.(?XXXXXXXXX[^\.]+)\." | stats c by domain
Aaargh - something seems to be wrong - I just cannot get HTML-specific characters too work. The XXXX should be replaced with the word "domain", enclosed in angle brackets (no quotes).
The final part after the | creates a table counting events by the newly extracted 'domain' field.
Hope this helps,
Kristian
Had same problem - this worked for me...
Created field extract named bcoat_proxysg: EXTRACT-cs_uri_authority with regex:
(?)..*?.(?P
then changed the search/view.
thanks for the reminder.
i doubt if regex (in Splunk) can do if-then-else. otherwise, a single regex cannot handle URL with many levels of sub-domains or variations.
Also, have you checked how your regex would handle subdomains/ports. I believe that it might fail to handle some cases.
Not saying that the one I provided is perfect, but it will at least pick something out of it, since it does not expect a slash after three groups of characters.
I don't really know what your format looks like, but there are a couple of possible patterns, where ABC is what you want to capture;
http://www.ABC.com
http://ABC.com
http://www.ABC.co.uk
https://ABC.co.uk
ftp://ABC.com:21
http://all.work.and.no.play.ABC.com
..then you also might have trailing slashes....
/k
Well, if you have extracted the fields 'bytes' and 'duration', I believe your stats command at the end of the line should read:
...| stats c sum(bytes) sum(duration) by domain
/k
thanks to kristian. the question is solved.
the regex i used is rex field=Url "[http|https|ftp|tcp]?\:\/\/[^\.]+\.(?
the regex is aimed to resolve the format ://xxx.domain.xxx/ (i duno if there is any error)
While i am still handling the regex stuff, there is actually a second question.
For example, there are 2 lines of event
maps.google.com bytes_a duration_a
docs.google.com bytes_b duration_b
Will it be combined as follows?
google bytes_a+b duration_a+b
Yeah, well, the IFX may have a hard time trying to find the correct regex. It isn't perfect, but you often get an idea on how to craft your own.
If this answered your question, please mark as "answered" a/o upvote. Thanks, K.
In case anyone would like to get quick answer on regex URL http://gskinner.com/RegExr/ (I suppose you need some basis on regex)
Btw, I have use the IFX and it seems not good in making custom regex for URL (I am not good at regex too).
Thank you very much. That is exactly what I would like to archieve.
I am not sure if the term "grouping" is appropriate.
I have the log downloaded from bluecoat server and would like to import it to Splunk for log analysis. Normally, splunk will treat each line (of bluecoat log) as an event. Each event contains some fields. One of them is URL-related. I would like to group each event with similar URL characteristic (i.e. under the same domain, in the example above, google). It is because the log may be huge. Doing such grouping will reduce the size. In addition, the result (or the report) looks simpler.
Sorry, are you talking about configuration of BlueCoat or Splunk? Not sure exactly what you want to do, though.
/k