Would like to extract fields from the below log by using reqular expressions. Can some one help me
28820.220: [Full GC (System.gc()) 8832K->8624K(37888K), 0.0261704 secs]
29372.500: [GC (Allocation Failure) 23984K->8816K(37888K), 0.0013546 secs]
29932.500: [GC (Allocation Failure) 24176K->8808K(37888K), 0.0017082 secs]
30492.500: [GC (Allocation Failure) 24168K->8960K(37888K), 0.0017122 secs]
31047.500: [GC (Allocation Failure) 24320K->8944K(37888K), 0.0020634 secs]
31602.500: [GC (Allocation Failure) 24304K->8992K(37888K), 0.0017542 secs]
32157.500: [GC (Allocation Failure) 24352K->8968K(37888K), 0.0018971 secs]
32420.247: [GC (System.gc()) 16160K->8944K(37888K), 0.0012816 secs]
32420.248: [Full GC (System.gc()) 8944K->8624K(37888K), 0.0205035 secs]
Would like to extract Full GC --- 8944K->8624K(37888K)
Field1: 8944 --- what ever comes throughout the multiple entries of Full GC
Field2: 8624 -- what ever comes throughout the multiple entries of Full GC
Field3: 37888 -- what ever comes throughout the multiple entries of Full GC
similarly for GC
Early help would be appreciate as my organization not allowing me to install field extractor app to extract easily these fields
@nagaraju_chittathuru, based on the sample events provided please try the following rex command.
<YourBaseSearch>
| rex field=_raw "\[([^\(]+)\(([^\)]+)\)[\)|\s]+(?<field1>\d+)K-\>(?<field2>\d+)K\((?<field3>\d+)K\)"
| table field1, field2, field3, _raw
You can use regex101.com for writing/testing your regular expressions. Also Splunk has its own Interactive Field Extraction (IFX)
that you can use for Splunk to come up with required Regular Expression.
Link to documentation: http://docs.splunk.com/Documentation/Splunk/latest/Knowledge/ExtractfieldsinteractivelywithIFX
Hi Woodcock,
Thanks for the regex above and this is working fine. From the above data extended the regex for below fields. I have tried and the t1,Field0,Field1,Field2,Field3,Field4 are displaying now . Some how I could not be able to extract these fields by using IFX and all the time I am getting error. Below Regex is extracting only "GC (Allocation Failure)" events. Would like to extend for other Full GC and GC events as well.
Full GC (System.gc()) as "gt"
GC (Allocation Failure) as "gt"
GC (System.gc()) as "gt"
(?ms)\d+\.\d+\D+^(?<t1>[^:]+):\s+(?<Field0>[^-\r\n\.\b]+)\s+(?<Field1>[^-\r\n\.]+)->(?<Field2>[^\(]+)\((?<Field3>[^\)]+)\),\s+(?<Field4>[^\s]+\ssecs)
Like this:
|makeresults | eval _raw="28820.220: [Full GC (System.gc()) 8832K->8624K(37888K), 0.0261704 secs]
29372.500: [GC (Allocation Failure) 23984K->8816K(37888K), 0.0013546 secs]
29932.500: [GC (Allocation Failure) 24176K->8808K(37888K), 0.0017082 secs]
30492.500: [GC (Allocation Failure) 24168K->8960K(37888K), 0.0017122 secs]
31047.500: [GC (Allocation Failure) 24320K->8944K(37888K), 0.0020634 secs]
31602.500: [GC (Allocation Failure) 24304K->8992K(37888K), 0.0017542 secs]
32157.500: [GC (Allocation Failure) 24352K->8968K(37888K), 0.0018971 secs]
32420.247: [GC (System.gc()) 16160K->8944K(37888K), 0.0012816 secs]
32420.248: [Full GC (System.gc()) 8944K->8624K(37888K), 0.0205035 secs]"
| rename COMMENT AS "Everything above generates a sample event; everything below is your solution"
| rex max_match=0 "(?ms)\d+\.\d+\D+(?<Field1>[^-\r\n\.]+)->(?<Field2>[^\(]+)\((?<Field3>[^\)]+)"
@nagaraju_chittathuru, based on the sample events provided please try the following rex command.
<YourBaseSearch>
| rex field=_raw "\[([^\(]+)\(([^\)]+)\)[\)|\s]+(?<field1>\d+)K-\>(?<field2>\d+)K\((?<field3>\d+)K\)"
| table field1, field2, field3, _raw
You can use regex101.com for writing/testing your regular expressions. Also Splunk has its own Interactive Field Extraction (IFX)
that you can use for Splunk to come up with required Regular Expression.
Link to documentation: http://docs.splunk.com/Documentation/Splunk/latest/Knowledge/ExtractfieldsinteractivelywithIFX
Hi niketnilay,
Thanks for quick turnaoround...when I build the query
mysearch | | rex field=_raw "[([^(]+)(([^)]+))[)|\s]+(?\d+)K->(?\d+)K((?\d+)K)"
| table field1, field2, field3, _raw
this is returning only the first Full GC event eventhough I have multiple Full GC in the same event.Any sort of help would be appreciated?
In case you have multiple matches in the same event you can use max_match
argument. If set to 0 it will try to find all matches
| rex field=_raw "\[([^\(]+)\(([^\)]+)\)[\)|\s]+(?<field1>\d+)K-\>(?<field2>\d+)K\((?<field3>\d+)K\)" max_match=0
I am trying to extend the the regex to extract the first time stamp by using the below
\s\w+.(?\w+:)..somehow it is extracting only after the decimal.
from the below example.could you pls help in this regard
29372.500: [GC (Allocation Failure) 23984K->8816K(37888K), 0.0013546 secs]
Try the following regular expression which extracts timestamp before colon sign i.e. 29372.500
etc as timestamp
:
^(?<timestamp>[^:]+):\s+\[([^\(]+)\(([^\)]+)\)[\)|\s]+(?<field1>\d+)K-\>(?<field2>\d+)K\((?<field3>\d+)K\)
Please make sure you use code button (101010)
on Splunk Answers while posting code so that special characters do not escape. Also as stated earlier test out your Regular Expression on regex101.com with your actual sample data.
Please check and confirm.
Hi niketnilay,
Thanks for quick turnaround. Below is the example data
28820.220: [Full GC (System.gc()) 8832K->8624K(37888K), 0.0261704 secs]
29372.500: [GC (Allocation Failure) 23984K->8816K(37888K), 0.0013546 secs]
out of this trying to extract the
28820.220 as "timestamp"
0.0261704 as "gctime"
8832k as "field1"
8624K as "field2"
37888K as "field3"
below is the final regex that I worked ...but for some events it is failing.
rex max_match=0 field=_raw "^(?<timestamp>[^:]+):\s+\[([^\(]+)\(([^\)]+)\)[\)|\s]+(?<field1>\d+)K-\>(?<field2>\d+)K\((?<field3>\d+)K\)\,+\s\w\.(?<gctime>\w+)\s"
looks like my "gctime" regex causing the issue...to get the value 0.0261704 do I need to tweak the regex ?
Please try the following:
^(?<timestamp>[^:]+):\s+\[([^\(]+)\(([^\)]+)\)[\)|\s]+(?<field1>\d+)K-\>(?<field2>\d+)K\((?<field3>\d+)K\),\s+(?<gctime>[^\s]+)\ssecs\]
You must try to grasp how regular expressions work in order to grasp and exploit its potential. As stated earlier, regex101.com is also a resource for quickly learning what the regex means. QUICK REFRENECE
is available on bottom right.
You can also refer to Regular Expressions in Splunk Docs: http://docs.splunk.com/Documentation/Splunk/latest/Knowledge/AboutSplunkregularexpressions
Please check and confirm.
@nagaraju_chittathuru, have you tried above regex? This should extract gctime
as well.
Hi niketnilay,
Below is the example data
28820.220: [Full GC (System.gc()) 8832K->8624K(37888K), 0.0261704 secs]
29372.500: [GC (Allocation Failure) 23984K->8816K(37888K), 0.0013546 secs]
out of this trying to extract only for "Full GC"
28820.220 as "timestamp"
0.0261704 as "gctime"
8832k as "field1"
8624K as "field2"
37888K as "field3"
below is the final regex that I worked with out timestamp this is pulling correct "Full GC" events and if I add timestamp it is pulling all Full GC as well as GC logs where I only need Full GC logs along with its timestamp
"^(?
As stated before, please make sure you use code button (101010)
in Splunk Answers for code you do not want characters to escape. Your regex is missing field names.
Following is the run anywhere search which works for your mock data using the same regular expression I had previously provided:
| makeresults
| eval _raw="28820.220: [Full GC (System.gc()) 8832K->8624K(37888K), 0.0261704 secs]
29372.500: [GC (Allocation Failure) 23984K->8816K(37888K), 0.0013546 secs]"
| rex max_match=0 field=_raw "^(?<timestamp>[^:]+):\s+\[([^\(]+)\(([^\)]+)\)[\)|\s]+(?<field1>\d+)K-\>(?<field2>\d+)K\((?<field3>\d+)K\),\s+(?<gctime>[^\s]+)\ssecs\]"
If you are testing the regular expression on regex101.com you need to turn on multi line
regex flag so that two events match the same regex. Or else paste only one event for testing at a time. Following is a screenshot from regex101.com confirming theregular expression is working fine:
this is what the issue...the regex is pulling all the events for "Full GC" and "GC"....where I am interested in only for "Full GC".
If I exclude timestamp and gctime query works fine and if include the timestamp and gctime will not pick.
Here is the regex I tried and I have modified for Full GC
"^(?<timestamp>[^:]+):\s+\[Full GC([^\(]+)\(([^\)]+)\)[\)|\s]+(?<field1>\d+)K-\>(?<field2>\d+)K\((?<field3>\d+)K\),\s+(?<gctime>[^\s]+)\ssecs\]"
Try this if you are only interested in Full GC (It will ignore the events with GC)
"^(?<timestamp>[^:]+):\s+\[Full GC\s\(([^\)]+)\)\)\s+(?<field1>\d+)K-\>(?<field2>\d+)K\((?<field3>\d+)K\),\s+(?<gctime>[^\s]+)\ssecs\]"
Hi niketnilay,
Trying to extract the below gt field by using the regex with Field0 and this is selecting only Allocation Failure one. Where I am interested in all THREE of the fields below.any suggestion for the below regex?
Full GC (System.gc()) as "gt"
GC (Allocation Failure) as "gt"
GC (System.gc()) as "gt"
(?ms)\d+\.\d+\D+^(?<t1>[^:]+):\s+(?<Field0>[^\r\n\.\b]+)\s+(?<Field1>[^-\r\n\.]+)->(?<Field2>[^\(]+)\((?<Field3>[^\)]+)\),\s+(?<Field4>[^\s]+\ssecs)
Hi nagaraju_chittathuru,
try this regex
\[Full GC.*\)\)\s(?<FullGC1>[^K]*)K-\>(?<FullGC2>[^K]*)K\((?<FullGC3>[^\)]*)
if instead of K you could have M or G, you can use
\[Full GC.*\)\)\s(?<FullGC1>[^KMG]*)(K|M|G)-\>(?<FullGC2>[^KMG]*)(K|M|G)\((?<FullGC3>[^KMG]*)
Test it at https://regex101.com/r/z3PqFP/1
Bye.
Giuseppe
Hi cusello,
Thanks for quick turnaoround...when I build the query
mysearch | rex field=_raw "[Full GC.))\s(?[^KMG])(K|M|G)->(?[^KMG])(K|M|G)((?[^KMG])" | table FullGC1, FullGC2, FullGC3, _raw
this is returning only the first Full GC event eventhough I have multiple Full GC in the same event.
in https://regex101.com/r/z3PqFP/1 it is showing the other occurences..but when I build the actual query only one row it is printing
Any sort of help would be appreciated?
Hi nagaraju_chittathuru,
try to add max_match=0
to the rex command
mysearch
| rex max_match=0 "[Full GC.))\s(?[^KMG])(K|M|G)-\>(?[^KMG])(K|M|G)((?[^KMG])"
| table FullGC1, FullGC2, FullGC3, _raw
Bye.
Giuseppe
Hi cusello,
Thanks a lot that works fine. Would like to extend the regex for the timestamp and gctime from the sample data below
28820.220: [Full GC (System.gc()) 8832K->8624K(37888K), 0.0261704 secs]
29372.500: [GC (Allocation Failure) 23984K->8816K(37888K), 0.0013546 secs]
out of this trying to extract the below fields ...could you help me around
28820.220 as "timestamp"
0.0261704 as "gctime"
mysearch
| rex max_match=0 "[Full GC.))\s(?[^KMG])(K|M|G)-\>(?[^KMG])(K|M|G)((?[^KMG])"
| table FullGC1, FullGC2, FullGC3, _raw