All Apps and Add-ons

Extract data from different log lines

Explorer

I have a log like this:

2012-02-14 06:30:55,766 INFO  - Registering handler for "myFunc": 5e25f109-5f4e-4758-8228-0af8da0004b7
... lots of other lines ...
2012-02-14 06:35:32,523 INFO  - authenticated user "John Doe" with IP: 192.168.82.183
... lots of other lines ... 
2012-02-14 06:45:36,278 INFO  - slow GET request 5e25f109-5f4e-4758-8228-0af8da0004b7 from 192.168.82.183:64807 took 4390 ms

I'd like to get a list of the slow requests in clear text like this:

"John Doe" "myFunc" 4390 ms

what would be the best approach to achieve this result?

0 Karma
1 Solution

Explorer

I think i got the solution. I want to know who is experiencing the slow requests and what is the cleartext name of the slow request. Therefore i startet with the slow request and joined the name using the ip nummer and with a second join I joined the cleartext handler name using the uuid

sourcetype="log4j""INFO  - slow GET request" | join ip [search "INFO  - authenticated user "] | join uuid [search "INFO  - Registering handler for"] | table date_year date_month date_mday date_hour date_minute date_second user_name func_name slow_func_time

all the fields like ip, uuid, user_name etc. were defined in advance. the field ip and uuid had to be defined twice once for the slow get record and once for the authentication record. The same with the uuid i defined one for the slow get record and one for the register handler record.

Thank you so much for your help!

View solution in original post

0 Karma

New Member

I need some help to extract the data from two lines as specified above but unable to do that.
Following are the two log lines

2013-01-18 17:54:50,188 [PxtId - 699479170] [PId - (null)] [MID: 157] DEBUG CloudComputingServiceInterceptor - Operation context: PyathonxId : 699479170, ApplicationId: CComp2, UserId: 7, Id: 1

5:54:50.188 PM 2013-01-18 17:54:50,188 [PxtId - 699479170] [PId - (null)] [MID: 157] DEBUG Cloud.Computing.Service - Enter GetNewData ...

PxtID is common in both the cases and i need to find based on UserId: (variable) how many times "Enter GetNewData ..." has been called....

Something like following
UserID: 7 = 15
UserID: 2 = 10
Or
If i provide UserId in the query it provide me the logs so I can count manually.

I really need this urgent, please help.

0 Karma

Explorer

I think i got the solution. I want to know who is experiencing the slow requests and what is the cleartext name of the slow request. Therefore i startet with the slow request and joined the name using the ip nummer and with a second join I joined the cleartext handler name using the uuid

sourcetype="log4j""INFO  - slow GET request" | join ip [search "INFO  - authenticated user "] | join uuid [search "INFO  - Registering handler for"] | table date_year date_month date_mday date_hour date_minute date_second user_name func_name slow_func_time

all the fields like ip, uuid, user_name etc. were defined in advance. the field ip and uuid had to be defined twice once for the slow get record and once for the authentication record. The same with the uuid i defined one for the slow get record and one for the register handler record.

Thank you so much for your help!

View solution in original post

0 Karma

Legend

I don't see an IP address in any log line but the second one...

0 Karma

Explorer

event #2 is related to event #3 using the IP address. event #3 is related to event #1 as you noticed using the guid. Therefore event #3 ties event #1 and #2 together.

My problem is the guid is different for each ip and each ip is different for each user and there are many different possible function handlers and I'd like to find the slow to get a list of the slow functions each distinct user had.

0 Karma

Communicator

If there is a regular sequence of events e.g. Registering Handler -> Authenticating User, slow GET request, one could use the transaction command. (The transaction command can be computationally expensive). http://docs.splunk.com/Documentation/Splunk/latest/SearchReference/transaction

"Registering handler" OR "authenticated user" OR "slow GET request"| transaction startswith="Registering handler" endswith="slow GET request" | table user function transaction_time

the table command expects the fields for 'user', 'function' and 'transaction_time' to be previously extracted. if you don't have those fields extracted, take a look at the field extraction docs: http://docs.splunk.com/Documentation/Splunk/latest/User/ExtractNewFields

0 Karma

Communicator

extending on Ayn's and Kristian's comments:

can multiple functions be invoked by same ip-guid combo in a time period (as in the sample events) ?

if, 'yes', then we cannot reliably determine the function that corresponds to 'slow GET'. consider the following:

registering handler funcA guidA
registering handler funcB guidA
slow GET IP-A:port guidA duration

given the data we have, we cannot assert whether funcA or funcB corresponds to the slow event.

is there any additional info that you can share with us ? is there some reliable sequence ? or time constraint ?

0 Karma

Communicator

agreed. my solution will not work for non-serial events

0 Karma

Ultra Champion

Sorry, but will probably not work, unless the log is a sequence of serial events making up transactions. More likely like a regular http access log, with several concurrent intermingled transactions, e.g.

Request handler A
Request handler B
Authenticate user A
Slow GET request A
Request handler C
Authenticate user B
Authenticate user C
Slow GET request C
Slow GET request B

I think the answer would be in subsearches and possibly (self)join, but I do not have all that much experience in that field.

/k

0 Karma

Legend

How do you know that the second event is related to the first and third one? #1 and #3 have a common guid that can be used to tie them together but #2 does not.

0 Karma