Splunk Search
Highlighted

Blackboard bb-access-logs not parsing correctly

Path Finder

Blackboard has changed the format of the bb-access-logs to include session information. With the new data the logs are being split into several logs and some are being indexed with the wrong timestamp. Is anyone else working with these logs and successfully indexing them?

Splunk universal forwarder, inputs.conf:

[monitor:///usr/local/blackboard/apps/tomcat/logs/bb-access-log.*.txt]
index=oc_dev
sourcetype=bb-access-log
ignoreOlderThan=1h

Splunk indexer props.conf:

[bb-access-log]
TIME_FORMAT = %d/%b/%Y:%H:%M:%S %z
MAX_TIMESTAMP_LOOKAHEAD = 85
SHOULD_LINEMERGE = False
TRUNCATE = 25000
LINE_BREAKER = ([\n\r]+)(\b(?:25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9][0-9]|[1-9])(?:\.(?:25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9][0-9]|[1-9])){3}\b)\s((\b(?:25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9][0-9]|[1-9])(?:\.(?:25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9][0-9]|[1-9])){3}\b)|\-)\s
EXTRACT-webcampus_access-fields = (?<src_ip>\b(?:(?:2(?:[0-4][0-9]|5[0-5])|[0-1]?[0-9]?[0-9])\.){3}(?:(?:2([0-4][0-9]|5[0-5])|[0-1]?[0-9]?[0-9]))\b)\s(?<dest_ip>.*?)\s(?<HTTPComponent>.*?)\s(?<duid>.*?)\s(?<dateInternal>\[(.*?)\])\s(?<HTTP>\"(.*?)\")\s(?<http_status_code>.*?)\s(?<unknownDigits2>.*?)\s(?<http_user_agent>\".*?\")\s

Example of possible log formatting:

ip_address ip_address http_code duid [%d/%b/%Y:%H:%M:%S %z] "http activity" digit digit "browser information" "session information" digit digit
ip_address - http_code duid [%d/%b/%Y:%H:%M:%S %z] "http activity" digit digit "browser information" "session information" digit digit
ip_address - http_code {unset id} [%d/%b/%Y:%H:%M:%S %z] "http activity" digit digit "-" "-" digit digit
- - - {unset id} [%d/%b/%Y:%H:%M:%S %z] "http activity" digit digit "-" "-" digit digit

Example of possible logs:

1.1.1.1 1.1.1.1 http-nio-8081-exec-3 _1_2 [06/Oct/2015:13:58:22 -0700] "POST /webapps/bb-social-learning-BBLEARN/dwr_open/call/plaincall/ToolActivityService.getActivityForAllTools.dwr HTTP/1.1" 200 116 "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.2311.135 Safari/537.36" "JSESSIONID=123456789123456879; BIGipServer~Part-Apps~blackboard-8081=155738122.37151.0000; cookies_enabled=yes; JSESSIONID=123456789123456879; web_client_cache_guid=57a5cf06-3530-48d4-a15f-24f7b7ad0c86; __utma=180787233.108259193.1444163123.1444163123.1444163123.1; __utmb=180787233.17.10.1444163123; __utmc=180787233; __utmz=180787233.1444163123.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none); session_id=123456789123456879; s_session_id=123456789123456879" 14 116

1.1.1.1 - http-nio-8081-exec-7 {unset id} [06/Oct/2015:14:45:52 -0700] "GET / " 200 975 "-" "-" 78 975

1.1.1.1 1.1.1.1 http-nio-8081-exec-39 _1_2 [06/Oct/2015:14:45:50 -0700] "POST /webapps/blackboard/dwr_open/call/plaincall/ToolActivityService.getActivityForAllTools.dwr HTTP/1.1" 200 116 "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.101 Safari/537.36" "JSESSIONID=123456789123456879; AMCV_774C31DD5342CAF40A490D44%40AdobeOrg=793872103%7CMCIDTS%7C16709%7CMCMID%7C54023606677090264832941195089008334173%7CMCAAMLH-1444243530%7C7%7CMCAAMB-1444243530%7CNRX38WO0n5BH8Th-nqAG_A%7CMCAID%7CNONE; s_vnum=1446231162345%26vn%3D1; optimizelySegments=%7B%22183805944%22%3A%22none%22%2C%22183871552%22%3A%22direct%22%2C%22183916464%22%3A%22false%22%2C%22184368091%22%3A%22gc%22%7D; optimizelyEndUserId=123456789123456879.123456789123456879; optimizelyBuckets=%7B%7D; s_fid=123456789123456879-1F5618C69F6F5C95; mbox=session#1443638730148-106896#1443641446|PC#1443638730148-106896.17_60#1444849186; _ga=GA1.2.1929637067.1443278384; BIGipServer~Part-Apps~blackboard-8081=155738122.37151.0000; xythosdrive=0; web_client_cache_guid=6251af67-d5e0-4752-a2f7-0a05da0b8839; __utma=184114681.1929637067.1443278384.1443715769.1444002370.6; __utmc=184114681; __utmz=184114681.1443278384.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none); JSESSIONID=123456789123456879; __utma=180787233.1929637067.1443278384.1444154819.1444159011.35; __utmc=180787233; __utmz=180787233.1443320404.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none); session_id=123456789123456879; s_session_id=123456789123456879" 3 116

Original post did not update successfully.

Highlighted

Re: Blackboard bb-access-logs not parsing correctly

Path Finder

Updated

LINE_BREAKER = (\b(?:25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9][0-9]|[1-9])(?:\.(?:25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9][0-9]|[1-9])){3}\b)\s((\b(?:25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9][0-9]|[1-9])(?:\.(?:25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9][0-9]|[1-9])){3}\b)|\-)\s

We've tried variations of the regex including

()\d{1,3}.\d{1,3}.\d{1,3}.\d{1,3}\s\d{1,3}.\d{1,3}.\d{1,3}.\d{1,3}
0 Karma
Highlighted

Re: Blackboard bb-access-logs not parsing correctly

Path Finder

Other LINE_BREAKER regex tried:

()\d{1,3}.\d{1,3}.\d{1,3}.\d{1,3}\s\d{1,3}.\d{1,3}.\d{1,3}.\d{1,3}
(\b(?:25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9][0-9]|[1-9])(?:\.(?:25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9][0-9]|[1-9])){3}\b)\s((\b(?:25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9][0-9]|[1-9])(?:\.(?:25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9][0-9]|[1-9])){3}\b)|\-)\s
0 Karma
Highlighted

Re: Blackboard bb-access-logs not parsing correctly

Splunk Employee
Splunk Employee

I don't know the format of these logs, but aren't the log entries one event per line anyway? In other words, wouldn't the default line breaking on \r\n work just fine?

0 Karma
Highlighted

Re: Blackboard bb-access-logs not parsing correctly

Path Finder

My original confs did not contain line breaker and the issue started.

0 Karma
Highlighted

Re: Blackboard bb-access-logs not parsing correctly

Esteemed Legend

I agree with @ssievert. Try this:

[bb-access-log]
TIME_PREFIX=^[^\[]+\[
TIME_FORMAT = %d/%b/%Y:%H:%M:%S %z
MAX_TIMESTAMP_LOOKAHEAD = 26
SHOULD_LINEMERGE = false
TRUNCATE = 25000
# LINE_BREAKER <- leave as default
EXTRACT-webcampus_access-fields = (?<src_ip>\b(?:(?:2(?:[0-4][0-9]|5[0-5])|[0-1]?[0-9]?[0-9])\.){3}(?:(?:2([0-4][0-9]|5[0-5])|[0-1]?[0-9]?[0-9]))\b)\s(?<dest_ip>.*?)\s(?<HTTPComponent>.*?)\s(?<duid>.*?)\s(?<dateInternal>\[(.*?)\])\s(?<HTTP>\"(.*?)\")\s(?<http_status_code>.*?)\s(?<unknownDigits2>.*?)\s(?<http_user_agent>\".*?\")\s
0 Karma
Highlighted

Re: Blackboard bb-access-logs not parsing correctly

Path Finder

Found the issue - The logs are being written in incomplete chunks by blackboard. Splunk is indexing logs as it sees it.
Hard learned lesson to tail your files to see how they are being written.

Highlighted

Re: Blackboard bb-access-logs not parsing correctly

Path Finder

Actual solution - The issue was Blackboard does not write complete events but instead queues data, writes, pauses to queue again and writes. Splunk support advised - As Splunk's tailing process can see the pause and interprets the complete event has been written and then tries to ingest the data. A majority of the time the data is written as complete events but not always. Easy way to check for bb-access-log accuracy is have search look if there are logs that do not contact source ip (e.g. NOT src_ip=*)

To resolve add timebeforeclose to forwarder inputs.

timebeforeclose = 30

Doc: http://docs.splunk.com/Documentation/Splunk/6.2.6/Admin/Inputsconf

timebeforeclose = integer

  • Modtime delta required before Splunk can close a file on EOF.
  • Tells the system not to close files that have been updated in past seconds.
  • Defaults to 3.

View solution in original post

0 Karma