Splunk Search

Garbage collection logs field extraction from log file

nagaraju_chitta
Path Finder

Would like to extract fields from the below log by using reqular expressions. Can some one help me

28820.220: [Full GC (System.gc()) 8832K->8624K(37888K), 0.0261704 secs]
29372.500: [GC (Allocation Failure) 23984K->8816K(37888K), 0.0013546 secs]
29932.500: [GC (Allocation Failure) 24176K->8808K(37888K), 0.0017082 secs]
30492.500: [GC (Allocation Failure) 24168K->8960K(37888K), 0.0017122 secs]
31047.500: [GC (Allocation Failure) 24320K->8944K(37888K), 0.0020634 secs]
31602.500: [GC (Allocation Failure) 24304K->8992K(37888K), 0.0017542 secs]
32157.500: [GC (Allocation Failure) 24352K->8968K(37888K), 0.0018971 secs]
32420.247: [GC (System.gc()) 16160K->8944K(37888K), 0.0012816 secs]
32420.248: [Full GC (System.gc()) 8944K->8624K(37888K), 0.0205035 secs]

Would like to extract Full GC --- 8944K->8624K(37888K)

Field1: 8944 --- what ever comes throughout the multiple entries of Full GC
Field2: 8624 -- what ever comes throughout the multiple entries of Full GC
Field3: 37888 -- what ever comes throughout the multiple entries of Full GC

similarly for GC

Early help would be appreciate as my organization not allowing me to install field extractor app to extract easily these fields

0 Karma
1 Solution

niketnilay
Legend

@nagaraju_chittathuru, based on the sample events provided please try the following rex command.

<YourBaseSearch>
| rex field=_raw "\[([^\(]+)\(([^\)]+)\)[\)|\s]+(?<field1>\d+)K-\>(?<field2>\d+)K\((?<field3>\d+)K\)"
| table field1, field2, field3, _raw

You can use regex101.com for writing/testing your regular expressions. Also Splunk has its own Interactive Field Extraction (IFX) that you can use for Splunk to come up with required Regular Expression.
Link to documentation: http://docs.splunk.com/Documentation/Splunk/latest/Knowledge/ExtractfieldsinteractivelywithIFX

____________________________________________
| makeresults | eval message= "Happy Splunking!!!"

View solution in original post

nagaraju_chitta
Path Finder

Hi Woodcock,
Thanks for the regex above and this is working fine. From the above data extended the regex for below fields. I have tried and the t1,Field0,Field1,Field2,Field3,Field4 are displaying now . Some how I could not be able to extract these fields by using IFX and all the time I am getting error. Below Regex is extracting only "GC (Allocation Failure)" events. Would like to extend for other Full GC and GC events as well.
Full GC (System.gc()) as "gt"
GC (Allocation Failure) as "gt"
GC (System.gc()) as "gt"

(?ms)\d+\.\d+\D+^(?<t1>[^:]+):\s+(?<Field0>[^-\r\n\.\b]+)\s+(?<Field1>[^-\r\n\.]+)->(?<Field2>[^\(]+)\((?<Field3>[^\)]+)\),\s+(?<Field4>[^\s]+\ssecs)
0 Karma

woodcock
Esteemed Legend

Like this:

|makeresults | eval _raw="28820.220: [Full GC (System.gc()) 8832K->8624K(37888K), 0.0261704 secs]
29372.500: [GC (Allocation Failure) 23984K->8816K(37888K), 0.0013546 secs]
29932.500: [GC (Allocation Failure) 24176K->8808K(37888K), 0.0017082 secs]
30492.500: [GC (Allocation Failure) 24168K->8960K(37888K), 0.0017122 secs]
31047.500: [GC (Allocation Failure) 24320K->8944K(37888K), 0.0020634 secs]
31602.500: [GC (Allocation Failure) 24304K->8992K(37888K), 0.0017542 secs]
32157.500: [GC (Allocation Failure) 24352K->8968K(37888K), 0.0018971 secs]
32420.247: [GC (System.gc()) 16160K->8944K(37888K), 0.0012816 secs]
32420.248: [Full GC (System.gc()) 8944K->8624K(37888K), 0.0205035 secs]"

| rename COMMENT AS "Everything above generates a sample event; everything below is your solution"

| rex max_match=0 "(?ms)\d+\.\d+\D+(?<Field1>[^-\r\n\.]+)->(?<Field2>[^\(]+)\((?<Field3>[^\)]+)"

niketnilay
Legend

@nagaraju_chittathuru, based on the sample events provided please try the following rex command.

<YourBaseSearch>
| rex field=_raw "\[([^\(]+)\(([^\)]+)\)[\)|\s]+(?<field1>\d+)K-\>(?<field2>\d+)K\((?<field3>\d+)K\)"
| table field1, field2, field3, _raw

You can use regex101.com for writing/testing your regular expressions. Also Splunk has its own Interactive Field Extraction (IFX) that you can use for Splunk to come up with required Regular Expression.
Link to documentation: http://docs.splunk.com/Documentation/Splunk/latest/Knowledge/ExtractfieldsinteractivelywithIFX

____________________________________________
| makeresults | eval message= "Happy Splunking!!!"

View solution in original post

nagaraju_chitta
Path Finder

Hi niketnilay,
Thanks for quick turnaoround...when I build the query

mysearch | | rex field=_raw "[([^(]+)(([^)]+))[)|\s]+(?\d+)K->(?\d+)K((?\d+)K)"
| table field1, field2, field3, _raw

this is returning only the first Full GC event eventhough I have multiple Full GC in the same event.Any sort of help would be appreciated?

0 Karma

niketnilay
Legend

In case you have multiple matches in the same event you can use max_match argument. If set to 0 it will try to find all matches

| rex field=_raw "\[([^\(]+)\(([^\)]+)\)[\)|\s]+(?<field1>\d+)K-\>(?<field2>\d+)K\((?<field3>\d+)K\)" max_match=0
____________________________________________
| makeresults | eval message= "Happy Splunking!!!"
0 Karma

nagaraju_chitta
Path Finder

I am trying to extend the the regex to extract the first time stamp by using the below
\s\w+.(?\w+:)..somehow it is extracting only after the decimal.
from the below example.could you pls help in this regard
29372.500: [GC (Allocation Failure) 23984K->8816K(37888K), 0.0013546 secs]

0 Karma

niketnilay
Legend

Try the following regular expression which extracts timestamp before colon sign i.e. 29372.500 etc as timestamp:

^(?<timestamp>[^:]+):\s+\[([^\(]+)\(([^\)]+)\)[\)|\s]+(?<field1>\d+)K-\>(?<field2>\d+)K\((?<field3>\d+)K\)

Please make sure you use code button (101010) on Splunk Answers while posting code so that special characters do not escape. Also as stated earlier test out your Regular Expression on regex101.com with your actual sample data.

Please check and confirm.

____________________________________________
| makeresults | eval message= "Happy Splunking!!!"
0 Karma

nagaraju_chitta
Path Finder

Hi niketnilay,
Thanks for quick turnaround. Below is the example data
28820.220: [Full GC (System.gc()) 8832K->8624K(37888K), 0.0261704 secs]
29372.500: [GC (Allocation Failure) 23984K->8816K(37888K), 0.0013546 secs]

out of this trying to extract the
28820.220 as "timestamp"
0.0261704 as "gctime"
8832k as "field1"
8624K as "field2"
37888K as "field3"

below is the final regex that I worked ...but for some events it is failing.

rex max_match=0 field=_raw "^(?<timestamp>[^:]+):\s+\[([^\(]+)\(([^\)]+)\)[\)|\s]+(?<field1>\d+)K-\>(?<field2>\d+)K\((?<field3>\d+)K\)\,+\s\w\.(?<gctime>\w+)\s" 

looks like my "gctime" regex causing the issue...to get the value 0.0261704 do I need to tweak the regex ?

0 Karma

niketnilay
Legend

Please try the following:

^(?<timestamp>[^:]+):\s+\[([^\(]+)\(([^\)]+)\)[\)|\s]+(?<field1>\d+)K-\>(?<field2>\d+)K\((?<field3>\d+)K\),\s+(?<gctime>[^\s]+)\ssecs\]

You must try to grasp how regular expressions work in order to grasp and exploit its potential. As stated earlier, regex101.com is also a resource for quickly learning what the regex means. QUICK REFRENECE is available on bottom right.
You can also refer to Regular Expressions in Splunk Docs: http://docs.splunk.com/Documentation/Splunk/latest/Knowledge/AboutSplunkregularexpressions
Please check and confirm.

____________________________________________
| makeresults | eval message= "Happy Splunking!!!"
0 Karma

niketnilay
Legend

@nagaraju_chittathuru, have you tried above regex? This should extract gctime as well.

____________________________________________
| makeresults | eval message= "Happy Splunking!!!"
0 Karma

nagaraju_chitta
Path Finder

Hi niketnilay,
Below is the example data
28820.220: [Full GC (System.gc()) 8832K->8624K(37888K), 0.0261704 secs]
29372.500: [GC (Allocation Failure) 23984K->8816K(37888K), 0.0013546 secs]

out of this trying to extract only for "Full GC"
28820.220 as "timestamp"
0.0261704 as "gctime"
8832k as "field1"
8624K as "field2"
37888K as "field3"
below is the final regex that I worked with out timestamp this is pulling correct "Full GC" events and if I add timestamp it is pulling all Full GC as well as GC logs where I only need Full GC logs along with its timestamp
"^(?[^:]+):\s+[Full GC([^(]+)(([^)]+))[)|\s]+(?\d+)K->(?\d+)K((?\d+)K),\s+(?[^\s]+)\ssecs]"

0 Karma

niketnilay
Legend

As stated before, please make sure you use code button (101010) in Splunk Answers for code you do not want characters to escape. Your regex is missing field names.

____________________________________________
| makeresults | eval message= "Happy Splunking!!!"
0 Karma

niketnilay
Legend

Following is the run anywhere search which works for your mock data using the same regular expression I had previously provided:

| makeresults
| eval _raw="28820.220: [Full GC (System.gc()) 8832K->8624K(37888K), 0.0261704 secs]
29372.500: [GC (Allocation Failure) 23984K->8816K(37888K), 0.0013546 secs]"
| rex max_match=0 field=_raw "^(?<timestamp>[^:]+):\s+\[([^\(]+)\(([^\)]+)\)[\)|\s]+(?<field1>\d+)K-\>(?<field2>\d+)K\((?<field3>\d+)K\),\s+(?<gctime>[^\s]+)\ssecs\]"

If you are testing the regular expression on regex101.com you need to turn on multi line regex flag so that two events match the same regex. Or else paste only one event for testing at a time. Following is a screenshot from regex101.com confirming theregular expression is working fine:

alt text

____________________________________________
| makeresults | eval message= "Happy Splunking!!!"
0 Karma

nagaraju_chitta
Path Finder

this is what the issue...the regex is pulling all the events for "Full GC" and "GC"....where I am interested in only for "Full GC".
If I exclude timestamp and gctime query works fine and if include the timestamp and gctime will not pick.
Here is the regex I tried and I have modified for Full GC

"^(?<timestamp>[^:]+):\s+\[Full GC([^\(]+)\(([^\)]+)\)[\)|\s]+(?<field1>\d+)K-\>(?<field2>\d+)K\((?<field3>\d+)K\),\s+(?<gctime>[^\s]+)\ssecs\]"
0 Karma

niketnilay
Legend

Try this if you are only interested in Full GC (It will ignore the events with GC)

"^(?<timestamp>[^:]+):\s+\[Full GC\s\(([^\)]+)\)\)\s+(?<field1>\d+)K-\>(?<field2>\d+)K\((?<field3>\d+)K\),\s+(?<gctime>[^\s]+)\ssecs\]"
____________________________________________
| makeresults | eval message= "Happy Splunking!!!"
0 Karma

nagaraju_chitta
Path Finder

Hi niketnilay,
Trying to extract the below gt field by using the regex with Field0 and this is selecting only Allocation Failure one. Where I am interested in all THREE of the fields below.any suggestion for the below regex?
Full GC (System.gc()) as "gt"
GC (Allocation Failure) as "gt"
GC (System.gc()) as "gt"

(?ms)\d+\.\d+\D+^(?<t1>[^:]+):\s+(?<Field0>[^\r\n\.\b]+)\s+(?<Field1>[^-\r\n\.]+)->(?<Field2>[^\(]+)\((?<Field3>[^\)]+)\),\s+(?<Field4>[^\s]+\ssecs)
0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi nagaraju_chittathuru,
try this regex

\[Full GC.*\)\)\s(?<FullGC1>[^K]*)K-\>(?<FullGC2>[^K]*)K\((?<FullGC3>[^\)]*)

if instead of K you could have M or G, you can use

\[Full GC.*\)\)\s(?<FullGC1>[^KMG]*)(K|M|G)-\>(?<FullGC2>[^KMG]*)(K|M|G)\((?<FullGC3>[^KMG]*)

Test it at https://regex101.com/r/z3PqFP/1
Bye.
Giuseppe

nagaraju_chitta
Path Finder

Hi cusello,
Thanks for quick turnaoround...when I build the query

mysearch | rex field=_raw "[Full GC.))\s(?[^KMG])(K|M|G)->(?[^KMG])(K|M|G)((?[^KMG])" | table FullGC1, FullGC2, FullGC3, _raw

this is returning only the first Full GC event eventhough I have multiple Full GC in the same event.
in https://regex101.com/r/z3PqFP/1 it is showing the other occurences..but when I build the actual query only one row it is printing
Any sort of help would be appreciated?

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi nagaraju_chittathuru,
try to add max_match=0 to the rex command

mysearch 
| rex max_match=0 "[Full GC.))\s(?[^KMG])(K|M|G)-\>(?[^KMG])(K|M|G)((?[^KMG])" 
| table FullGC1, FullGC2, FullGC3, _raw

Bye.
Giuseppe

0 Karma

nagaraju_chitta
Path Finder

Hi cusello,
Thanks a lot that works fine. Would like to extend the regex for the timestamp and gctime from the sample data below

28820.220: [Full GC (System.gc()) 8832K->8624K(37888K), 0.0261704 secs]
29372.500: [GC (Allocation Failure) 23984K->8816K(37888K), 0.0013546 secs]

out of this trying to extract the below fields ...could you help me around
28820.220 as "timestamp"
0.0261704 as "gctime"

 mysearch 
 | rex max_match=0 "[Full GC.))\s(?[^KMG])(K|M|G)-\>(?[^KMG])(K|M|G)((?[^KMG])" 
 | table FullGC1, FullGC2, FullGC3, _raw
0 Karma
Register for .conf21 Now! Go Vegas or Go Virtual!

How will you .conf21? You decide! Go in-person in Las Vegas, 10/18-10/21, or go online with .conf21 Virtual, 10/19-10/20.