I'd use SplunkforProxy and try to modify transforms.conf like this:
REGEX = ^[^/]+\s+\d\s+[0-9\:]*\s+0-9\.*\s+[^/]\s+\d\s+[0-9\:]*\s+.*\s+[^/]+\s+\d\s+\d+\.\d+\s+(\d+)\s+([0-9\.]*)\s+([^/]+)/(\d+)\s+(\d+)\s+(\w+)\s+((?:([^:]*)://)?([^/:]+):?(\d+)?(/?[^ ]*))\s+(\S+)\s+([^/]+)/([^ ]+)\s+(.*)$
FORMAT = duration::$1 clientip::$2 action::$3 http_status::$4 bytes::$5 method::$6 uri::$7 proto::$8 uri_host::$9 uri_port::$10 uri_path::$11 username::$12 hierarchy::$13 server_ip::$14 content_type::$15
and source event like this:
Dec 13 14:21:30 192.168.253.6 Dec 13 14:20:20 squid.proxy.com.xx SquidProxyLog 0 1323757220.346 5 192.168.253.6 TCP_MISS/302 567 GET http://www.xxxxx.com.tw/Transfer/Toad.aspx? - DIRECT/61.xx.xx.132 text/htmlDec 13
But seems not work, does my regex make error? Thanks
Instead of using such an extensive regex, have you not tried breaking the extractions down. If there are any inconsistencies with your data/events, your regex will "break".
Your regex does not seem to apply to the event you're giving as an example. My advice is to try the regex out in a regex testing tool such as http://regexpal.com/ or http://gskinner.com/RegExr/ first. Remember that the initial date is part of the event as well.
I'd use RegexBuddy to test it, seems still error there. I'll check again, thx.