Splunk Search

extract errors from unstructured log file with rex

mehrdad_2000
Builder

Hi
I have several unstructured log file that need extract error messges with rex spl command.

1-what is the optimize way to extract error messages from those logs? 

2-group by error type (count by error type)

e.g: 19 Socket recv failed: Connection TimeOut 

         3   readData failed. Read

          3    Invalid Length for facility number

          17   Duplicate - Stop Old Connection from IP

Here is the sample:

00:03:00.895 APP module: Error: readData failed. Read [0] bytes instead of 4 for Len
00:03:00.895 APP module: Error: Socket recv failed: Connection TimeOut IP[192.168.1.12] Socket[405]
00:02:59.791 APP module1: T[0]R[0]L: ERROR: Invalid Length for facility number [000000000] !
00:02:55.193 APP module: Error: Socket recv failed: Connection TimeOut IP[192.168.1.112] Socket[705]
00:02:50.536 APP module: Error: Socket recv failed: Connection Reset by Peer[FIN Received] IP[192.168.13.1] Socket[114]
00:02:49.205 APP module: Error: Socket recv failed: Connection TimeOut IP[192.168.1.14] Socket[213]
00:02:46.317 APP module: Error: Duplicate - Stop Old Connection from IP[192.168.1.51]
00:02:44.467 APP module: Error: Socket recv failed: Connection TimeOut IP[192.168.1.13] Socket[697]
00:02:43.468 APP module2: T[0]R[0]L: Error: Invalid TopUp No!
00:02:40.047 APP module: Error: Duplicate - Stop Old Connection from IP[192.168.1.123]
00:02:34.424 APP module: Error: Duplicate - Stop Old Connection from IP[192.168.1.13]
00:02:27.125 APP module: Error: Duplicate - Stop Old Connection from IP[192.168.1.14]
00:02:25.840 APP module: Error: Socket recv failed: Connection TimeOut IP[192.168.1.1] Socket[506]
00:02:21.836 APP module: Error: Duplicate - Stop Old Connection from IP[192.168.1.1]
00:02:21.434 APP module: Error: Socket recv failed: Connection Reset by Peer[FIN Received] IP[192.168.1.1] Socket[291]
00:02:18.846 APP module: Error: Socket recv failed: Connection TimeOut IP[192.168.1.1] Socket[220]
00:02:16.861 APP module: Error: Socket recv failed: Connection TimeOut IP[192.168.1.1] Socket[67]
00:02:16.855 APP module: Error: Duplicate - Stop Old Connection from IP[192.168.1.1]
00:02:13.954 APP module: Error: Duplicate - Stop Old Connection from IP[192.168.1.1]
00:02:13.085 APP module: Error: Socket recv failed: Connection TimeOut IP[192.168.1.1] Socket[284]
00:02:08.332 APP module: Error: Duplicate - Stop Old Connection from IP[192.168.1.1]
00:01:59.926 APP module: Error: Socket recv failed: Connection TimeOut IP[192.168.1.1] Socket[824]
00:01:59.371 APP module: Error: Socket recv failed: Connection TimeOut IP[192.168.1.1] Socket[216]
00:01:57.313 APP module3: X[0000]T[000000]R[000]L: ERR logoutInternalErr200Or100Or100: Txn Was Not Found To Logout
00:01:55.881 APP module: Error: Socket recv failed: Connection TimeOut IP[192.168.1.1] Socket[104]
00:01:49.036 APP module: Error: Socket recv failed: Connection TimeOut IP[192.168.1.1] Socket[191]
00:01:48.551 APP module2: T[0]R[0]L: Error: DoAction can not find action. TypeId(-1) Expect(0)
00:01:48.266 APP module: Error: Duplicate - Stop Old Connection from IP[192.168.1.1]
00:01:46.272 APP module: Error: Duplicate - Stop Old Connection from IP[192.168.1.1]
00:01:44.942 APP module: Error: Socket recv failed: Connection TimeOut IP[192.168.1.1] Socket[37]
00:01:44.016 APP module: Error: Socket recv failed: Connection TimeOut IP[192.168.1.1] Socket[449]
00:01:43.305 APP module: Error: Socket recv failed: Connection TimeOut IP[192.168.1.1] Socket[345]
00:01:38.840 APP module: Error: Socket recv failed: Connection Reset by Peer[FIN Received] IP[195.165.249.51] Socket[655]
00:01:29.366 APP module2: T[0]R[0]L: ERROR: Invalid Length for facility number [000000000000] !
00:01:27.744 APP module: Error: Duplicate - Stop Old Connection from IP[192.168.1.1]
00:01:26.463 APP module: Error: Duplicate - Stop Old Connection from IP[192.168.1.1]
00:01:24.663 APP module: Error: Socket recv failed: Connection TimeOut IP[192.168.1.1] Socket[195]
00:01:21.249 APP module: Error: Socket recv failed: Connection Reset by Peer[FIN Received] IP[192.168.1.1] Socket[689]
00:01:19.752 APP module: Error: Duplicate - Stop Old Connection from IP[192.168.1.1]
00:01:15.978 APP module2: T[0]R[0]L: ERROR: Invalid Length for facility number [0000000000] !
00:01:08.395 APP module: Error: Socket recv failed: Connection TimeOut IP[192.168.1.1] Socket[372]
00:01:08.367 APP module2: T[0]R[0]L: Error: Can not find exe []
00:00:55.808 APP1 module4: Error: Socket recv failed: Connection TimeOut IP[192.168.1.1] Socket[313]
00:00:54.566 APP module: Error: Duplicate - Stop Old Connection from IP[192.168.1.1]
00:00:53.914 APP module: Error: Socket recv failed: Connection Reset by Peer[FIN Received] IP[192.168.1.1] Socket[248]
00:00:47.717 APP module: Error: Socket recv failed: Connection TimeOut IP[192.168.1.1] Socket[197]
00:00:43.755 APP2 module4: Error: Duplicate - Stop Old Connection from IP[192.168.1.1]
00:00:39.936 APP2 module4: Error: Duplicate - Stop Old Connection from IP[192.168.1.1]
00:00:37.646 APP module: Error: Duplicate - Stop Old Connection from IP[192.168.1.1]

00:02:43.468 APP module4: T[0]R[0]L: Error: Invalid TopUp No!
00:03:00.895 APP module4: Error: readData failed. Read [0] bytes instead of 4 for Len
23:50:41.582 APP module4: X[00000]T[000000]R[0]L: oiu_fetch Error: I Cannot Found Any For This code:[0000000000]
00:00:03.164 APP module: T[0]R[0]L: Error: Module does not produce Pin Block. Call Supervisor. U[3357]

Any idea?

Thanks,

Labels (5)
0 Karma

richgalloway
SplunkTrust
SplunkTrust

It would help to know what you've tried already, but perhaps this will help.

... | rex "Error: (?<Error>[^\[]+)"
---
If this reply helps you, an upvote would be appreciated.
0 Karma

mehrdad_2000
Builder

Thank you for answer,

I try this 

| rex "Error: (?<Error>[^\[]+)"

ouput is:

Socket recv failed: Connection TimeOut IP[192.168.1.2] Socket[406]
Socket recv failed: Connection TimeOut IP[192.168.1.4] Socket[397]
Socket recv failed: Connection TimeOut IP[192.168.90.20] Socket[474]

as you see consider them as seprate events, my goal here is to group by them like this:

excpected output:
Socket recv failed: Connection TimeOut (3 times)

 

 

another example:

current output:

Invalid Length for facility number [000000000] !

Invalid Length for facility number [112222222] !

 

 

excpected output:

Invalid Length for facility number (2 times)

 

 

UPDATE : i try to use sed but it really slow

index="my-index" err* 
|rex "Error: (?<Errors>^\[]+)"
|rex field=Errors mode=sed "s/[[].*//g"
|top  Errors

 

Any idea?

Thanks,

0 Karma

PickleRick
Motivator

In general - you don't merge events as such and count them... Unless you do an explicit search.

So you firstly need to parse the event to get the error type from each event and then you can try - for example - doing stats or eventstats on those error types.

0 Karma

mehrdad_2000
Builder

would you please write spl command with sample data that I post?

Thanks,

0 Karma

PickleRick
Motivator

Firstly, you extract the error type as per @richgalloway's

 

 

... | rex "Error: (?<Error>[^\[]+)"

 

 

Then, there are at least two ways of counting the consecutive events.

One - which is not mine, I nicked it off someone else's solution - is to do a trick with streamstats and reversing

| streamstats count as errcount by Error reset_on_change=t
| reverse
| streamstats count as auxcount by Error reset_on_change=t
| where auxcount=1
| reverse

To be honest - I didn't test it

Other option is to use streamstats as well but with autoregress

| streamstats count as Errcount by Error current=t reset_on_change=t 
| autoregress Error as oldError
| streamstats count(eval(Error!=oldError)) as difcount
| stats max(Errcount) as Errcount values(Error) as Error by difcount 

But that counts consecutive event occurences. If you simply want a global aggregate, you just do

| stats count by Error

 

0 Karma
.conf21 Now Fully Virtual!
Register for FREE Today!

We've made .conf21 totally virtual and totally FREE! Our completely online experience will run from 10/19 through 10/20 with some additional events, too!