Getting Data In

Correlate logs when time order is inconsistent

andrewholmes
New Member

I have spent a lot of time trying to get something that works perfectly here, and I just can't get more than partial success. I'm trying to correlate apache and (rails) app logs into transactions that contain the full rails transaction log PLUS the one line of apache log that was the user's request. The problem is that the timestamps on the apache logs move around relative to the timestamps on the app logs.

For example, an app log for a complete single rails transaction will look like this:

Nov  2 13:58:10 app1 production[10944]: Processing FormsController#logo (for <IP masked> at 2011-11-02 13:58:10) [GET]
...(more app log)...
Nov  2 13:58:10 app1 production[10944]: Completed in 35ms (View: 1, DB: 30) | 200 OK [https://xxx/form/199/logo?1320256689]

And its corresponding apache log entry will look like this:

<IP masked> - - [02/Nov/2011:13:58:10 -0400] "GET /form/199/logo?1320256689 HTTP/1.1" 200 7024 "https://xxx" "Mozilla/5.0 (Windows NT 5.1; rv:7.0.1) Gecko/20100101 Firefox/7.0.1"

I'm able to do a transaction on uri (/form/199/logo?1320256689),client IP,pid and SOMETIMES that works:

sourcetype="syslog" OR source=/var/log/httpd/access_log | transaction uri,clientip,pid

To give me the desired result:

Nov  2 13:58:10 app1 production[10944]: Processing FormsController#logo (for <IP masked> at 2011-11-02 13:58:10) [GET]
...(more app log)...
Nov  2 13:58:10 app1 production[10944]: Completed in 35ms (View: 1, DB: 30) | 200 OK [https://xxx/form/199/logo?1320256689]
<IP masked> - - [02/Nov/2011:13:58:10 -0400] "GET /form/199/logo?1320256689 HTTP/1.1" 200 7024 "https://xxx" "Mozilla/5.0 (Windows NT 5.1; rv:7.0.1) Gecko/20100101 Firefox/7.0.1"

The problem is that the relative location of the apache event moves around with respect to its app server events. So

sourcetype="syslog" OR source=/var/log/httpd/access_log

Returns something like:

txn 1 rails first event
...
txn 1 rails last event
txn 2 rails first event
...
txn 2 rails last event
txn 1 apache event
txn 2 apache event
txn 3 rails first event
...
txn 3 apache event
txn 3 rails last event
txn 4 apache event
txn 4 rails first event
txn 4 rails last event

Due to slight subsecond timing variances.

I have experimented heavily with startswith and endswith, sorting by time and eventtype, etc., all to no avail. Some of my transactions include the apache event, and some don't.

So, my question is, how do I correlate the rails app server log into transactions that start with a "Processing..." event, end with a "Completed..." event, and include ONE AND ONLY THE ONE apache event whose URI matches the rails transaction, but which may occur timewise before, during, or after the rails events in the combined log?

Thanks for any help on this painful issue!

Andy

Tags (2)
0 Karma

woodcock
Esteemed Legend

You will never get this to work if you are using DATETIME_CONFIG = CURRENT which many people use because it is so easy and saves CPU cycles but I never use it because of just your kind of need. So stop using this setting if you are. Other than that, I can't think of any way to do this but it does seem to me that if all of your servers are using NTP that the worst scenario would be that the events have the exact same timestamp and be immediately adjacent to eachother but that is not what your sample data is showing. Is there any way you can verify your clock-sync and timestamping?

0 Karma
Get Updates on the Splunk Community!

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

Splunk is officially part of Cisco

Revolutionizing how our customers build resilience across their entire digital footprint.   Splunk ...

Splunk APM & RUM | Planned Maintenance March 26 - March 28, 2024

There will be planned maintenance for Splunk APM and RUM between March 26, 2024 and March 28, 2024 as ...