I am trying to parse MySQL slowlogs and get the query extract from the log.
I have the following format in the log:
# Time: 120316 9:27:20
# User@Host: xxx[xxx] @ xxx.xxx.xxx.dk [xxx.xxx.xxx.xxx]
# Query_time: 6.848644 Lock_time: 0.000022 Rows_sent: 0 Rows_examined: 1
SET timestamp=xxx;
SELECT * FROM DUAL;
# User@Host: xxx[xxx] @ xxx.xxx.xxx.dk [xxx.xxx.xxx.xxx]
# Query_time: 12.537060 Lock_time: 0.000035 Rows_sent: 0 Rows_examined: 0
SET timestamp=xxx;
SELECT * FROM DUAL;
# User@Host: xxx[xxx] @ xxx.xxx.xxx.dk [xxx.xxx.xxx.xxx]
# Query_time: 16.935841 Lock_time: 0.000046 Rows_sent: 0 Rows_examined: 0
SET timestamp=xxx;
SELECT * FROM DUAL;
And when I try to do a search time extract of a field I use
sourcetype=mysql-slow | rex "(?<aaa>(DELETE|SELECT|INSERT|ALTER|GRANT|REPLACE).*(;$))" | top 50 aaa
When I would get this output:
1: SELECT * FROM DUAL;
2: SELECT * FROM DUAL;
3: SELECT * FROM DUAL;
This is what I want, but then I put this in my transforms.conf
[query_for_mysql_slow]
REGEX = (?<query>(DELETE|SELECT|INSERT|ALTER|GRANT|REPLACE).*(;$))
Now when I try the search
sourcetype=mysql-slow | top 50 query
I get
SELECT * FROM DUAL; # User@Host: xxx[xxx] @ xxx.xxx.xxx.dk [xxx.xxx.xxx.xxx] # Query_time: 12.537060 Lock_time: 0.000035 Rows_sent: 0 Rows_examined: 0 SET timestamp=xxx; SELECT * FROM DUAL; # User@Host: xxx[xxx] @ xxx.xxx.xxx.dk [xxx.xxx.xxx.xxx] # Query_time: 16.935841 Lock_time: 0.000046 Rows_sent: 0 Rows_examined: 0 SET timestamp=xxx; SELECT * FROM DUAL;
I do not understand why it does not work when put into transforms.conf
Any ideas?
Well there are few possible explanations:
If you just copied the transforms stuff from the actual file, there is a typo. It's REGEX, not REXEX. If you just want to make a search time extraction of the field, you should probably try something like;
in props.conf
[mysql-slow]
EXTRACT-slow_query = ^(?<query>(DELETE|SELECT|INSERT|ALTER|GRANT|REPLACE).*;)$
There should be no configuration in transforms.conf.
There are other things that are a bit unclear regarding what you want to achieve, and how you try to do that, but try this first.
UPDATE:
Fixed the factual error regarding regex in transforms - sorry about that.
Does Splunk parse the sample log as one or three events? Do you really want three events? I'm not 100% sure how splunk would handle events #2 and #3 correctly, since there is no timestamp. Or would it do to have a single event with a multivalued field for the query?
On a side note, I believe the reason for the bad results is that the regex might be too greedy. Try adding a '?' after .*
[mysql-slow]
EXTRACT-slow_query = ^(?<query>(DELETE|SELECT|INSERT|ALTER|GRANT|REPLACE).*?;)$
This however will only produce one result, even if there are more than one occurence.
So either break the log into separate events and hope that timestamping will work, and use the regex above. Or keep the log as one event and use transforms.conf like this;
in props.conf
REPORT-slowq = slow_queries
in transforms.conf
[slow_queries]
REGEX = ^(?<query>(DELETE|SELECT|INSERT|ALTER|GRANT|REPLACE).*?;)$
MV_ADD = true
Hope this helps,
Kristian
EDIT: factual errors, my bad
Well there are few possible explanations:
If you just copied the transforms stuff from the actual file, there is a typo. It's REGEX, not REXEX. If you just want to make a search time extraction of the field, you should probably try something like;
in props.conf
[mysql-slow]
EXTRACT-slow_query = ^(?<query>(DELETE|SELECT|INSERT|ALTER|GRANT|REPLACE).*;)$
There should be no configuration in transforms.conf.
There are other things that are a bit unclear regarding what you want to achieve, and how you try to do that, but try this first.
UPDATE:
Fixed the factual error regarding regex in transforms - sorry about that.
Does Splunk parse the sample log as one or three events? Do you really want three events? I'm not 100% sure how splunk would handle events #2 and #3 correctly, since there is no timestamp. Or would it do to have a single event with a multivalued field for the query?
On a side note, I believe the reason for the bad results is that the regex might be too greedy. Try adding a '?' after .*
[mysql-slow]
EXTRACT-slow_query = ^(?<query>(DELETE|SELECT|INSERT|ALTER|GRANT|REPLACE).*?;)$
This however will only produce one result, even if there are more than one occurence.
So either break the log into separate events and hope that timestamping will work, and use the regex above. Or keep the log as one event and use transforms.conf like this;
in props.conf
REPORT-slowq = slow_queries
in transforms.conf
[slow_queries]
REGEX = ^(?<query>(DELETE|SELECT|INSERT|ALTER|GRANT|REPLACE).*?;)$
MV_ADD = true
Hope this helps,
Kristian
EDIT: factual errors, my bad
Thanks Kristian,
The REXEX is a typo...
From the documentation http://docs.splunk.com/Documentation/Splunk/latest/admin/transformsconf the REGEX can can be specified like that in trancforms.conf
* REGEX and the FORMAT attribute:
* Name-capturing groups in the REGEX are extracted directly to fields. This means that you
do not need to specify the FORMAT attribute for simple field extraction cases (see the
description of FORMAT, below).
* If the REGEX extracts both the field name and its corresponding field value, you can use
the following special capturing groups if you want to skip specifying the mapping in
FORMAT:
_KEY_<string>, _VAL_<string>.
* For example, the following are equivalent:
* Using FORMAT:
* REGEX = ([a-z]+)=([a-z]+)
* FORMAT = $1::$2
* Without using FORMAT
* REGEX = (?<_KEY_1>[a-z]+)=(?<_VAL_1>[a-z]+)
* When using either of the above formats, in a search-time extraction, the
regex will continue to match against the source text, extracting as many
fields as can be identified in the source text.
I tried to put in the EXTRACT in props.conf, but it gives the same result as putting it in transforms.conf.
I want to achieve to be able to extract each query as a field and also I want to use this regex to split after this line with the query in.
So in the sample I make I want the following result:
Extract the field query
from each line
1: SELECT * FROM DUAL;
2: SELECT * FROM DUAL;
3: SELECT * FROM DUAL;
and I want to use
to make sure the sample is brokken up in 3 events: MUST_NOT_BREAK_AFTER = regex
Event 1:
\# Time: 120316 9:27:20
\# User@Host: xxx[xxx] @ xxx.xxx.xxx.dk [xxx.xxx.xxx.xxx]
\# Query_time: 6.848644 Lock_time: 0.000022 Rows_sent: 0 Rows_examined: 1
SET timestamp=xxx;
SELECT * FROM DUAL;
Event 2:
\# User@Host: xxx[xxx] @ xxx.xxx.xxx.dk [xxx.xxx.xxx.xxx]
\# Query_time: 12.537060 Lock_time: 0.000035 Rows_sent: 0 Rows_examined: 0
SET timestamp=xxx;
SELECT * FROM DUAL;
Event 3:
\# User@Host: xxx[xxx] @ xxx.xxx.xxx.dk [xxx.xxx.xxx.xxx]
\# Query_time: 16.935841 Lock_time: 0.000046 Rows_sent: 0 Rows_examined: 0
SET timestamp=xxx;
SELECT * FROM DUAL;
see updated answer /k