Solved: Re: rex and transforms.conf not giving the same re...

jvl_netic · ‎03-16-2012

I am trying to parse MySQL slowlogs and get the query extract from the log.

I have the following format in the log:

# Time: 120316 9:27:20 # User@Host: xxx[xxx] @ xxx.xxx.xxx.dk [xxx.xxx.xxx.xxx] # Query_time: 6.848644 Lock_time: 0.000022 Rows_sent: 0 Rows_examined: 1 SET timestamp=xxx; SELECT * FROM DUAL; # User@Host: xxx[xxx] @ xxx.xxx.xxx.dk [xxx.xxx.xxx.xxx] # Query_time: 12.537060 Lock_time: 0.000035 Rows_sent: 0 Rows_examined: 0 SET timestamp=xxx; SELECT * FROM DUAL; # User@Host: xxx[xxx] @ xxx.xxx.xxx.dk [xxx.xxx.xxx.xxx] # Query_time: 16.935841 Lock_time: 0.000046 Rows_sent: 0 Rows_examined: 0 SET timestamp=xxx; SELECT * FROM DUAL;

And when I try to do a search time extract of a field I use

When I would get this output:

1: SELECT * FROM DUAL; 2: SELECT * FROM DUAL; 3: SELECT * FROM DUAL;

This is what I want, but then I put this in my transforms.conf

Now when I try the search

sourcetype=mysql-slow | top 50 query

I get

SELECT * FROM DUAL; # User@Host: xxx[xxx] @ xxx.xxx.xxx.dk [xxx.xxx.xxx.xxx] # Query_time: 12.537060 Lock_time: 0.000035 Rows_sent: 0 Rows_examined: 0 SET timestamp=xxx; SELECT * FROM DUAL; # User@Host: xxx[xxx] @ xxx.xxx.xxx.dk [xxx.xxx.xxx.xxx] # Query_time: 16.935841 Lock_time: 0.000046 Rows_sent: 0 Rows_examined: 0 SET timestamp=xxx; SELECT * FROM DUAL;

I do not understand why it does not work when put into transforms.conf

Any ideas?

kristian_kolb · ‎03-16-2012

Well there are few possible explanations:

If you just copied the transforms stuff from the actual file, there is a typo. It's REGEX, not REXEX. If you just want to make a search time extraction of the field, you should probably try something like;

in props.conf

[mysql-slow]
EXTRACT-slow_query = ^(?<query>(DELETE|SELECT|INSERT|ALTER|GRANT|REPLACE).*;)$

There should be no configuration in transforms.conf.

There are other things that are a bit unclear regarding what you want to achieve, and how you try to do that, but try this first.

UPDATE:

Fixed the factual error regarding regex in transforms - sorry about that.

Does Splunk parse the sample log as one or three events? Do you really want three events? I'm not 100% sure how splunk would handle events #2 and #3 correctly, since there is no timestamp. Or would it do to have a single event with a multivalued field for the query?

On a side note, I believe the reason for the bad results is that the regex might be too greedy. Try adding a '?' after .*

[mysql-slow]
EXTRACT-slow_query = ^(?<query>(DELETE|SELECT|INSERT|ALTER|GRANT|REPLACE).*?;)$

This however will only produce one result, even if there are more than one occurence.

So either break the log into separate events and hope that timestamping will work, and use the regex above. Or keep the log as one event and use transforms.conf like this;

in props.conf

REPORT-slowq = slow_queries

in transforms.conf

[slow_queries]
REGEX = ^(?<query>(DELETE|SELECT|INSERT|ALTER|GRANT|REPLACE).*?;)$
MV_ADD = true

Hope this helps,

Kristian

EDIT: factual errors, my bad

View solution in original post

kristian_kolb · ‎03-16-2012

Well there are few possible explanations:

If you just copied the transforms stuff from the actual file, there is a typo. It's REGEX, not REXEX. If you just want to make a search time extraction of the field, you should probably try something like;

in props.conf

[mysql-slow]
EXTRACT-slow_query = ^(?<query>(DELETE|SELECT|INSERT|ALTER|GRANT|REPLACE).*;)$

There should be no configuration in transforms.conf.

There are other things that are a bit unclear regarding what you want to achieve, and how you try to do that, but try this first.

UPDATE:

Fixed the factual error regarding regex in transforms - sorry about that.

Does Splunk parse the sample log as one or three events? Do you really want three events? I'm not 100% sure how splunk would handle events #2 and #3 correctly, since there is no timestamp. Or would it do to have a single event with a multivalued field for the query?

On a side note, I believe the reason for the bad results is that the regex might be too greedy. Try adding a '?' after .*

[mysql-slow]
EXTRACT-slow_query = ^(?<query>(DELETE|SELECT|INSERT|ALTER|GRANT|REPLACE).*?;)$

This however will only produce one result, even if there are more than one occurence.

So either break the log into separate events and hope that timestamping will work, and use the regex above. Or keep the log as one event and use transforms.conf like this;

in props.conf

REPORT-slowq = slow_queries

in transforms.conf

[slow_queries]
REGEX = ^(?<query>(DELETE|SELECT|INSERT|ALTER|GRANT|REPLACE).*?;)$
MV_ADD = true

Hope this helps,

Kristian

EDIT: factual errors, my bad

jvl_netic · ‎03-16-2012

Thanks Kristian,

The REXEX is a typo...

From the documentation http://docs.splunk.com/Documentation/Splunk/latest/admin/transformsconf the REGEX can can be specified like that in trancforms.conf

* REGEX and the FORMAT attribute:
    * Name-capturing groups in the REGEX are extracted directly to fields. This means that you
      do not need to specify the FORMAT attribute for simple field extraction cases (see the 
      description of FORMAT, below).
    * If the REGEX extracts both the field name and its corresponding field value, you can use 
      the following special capturing groups if you want to skip specifying the mapping in 
      FORMAT: 
      _KEY_<string>, _VAL_<string>. 
    * For example, the following are equivalent:
        * Using FORMAT:
            * REGEX  = ([a-z]+)=([a-z]+)
            * FORMAT = $1::$2
        * Without using FORMAT
            * REGEX  = (?<_KEY_1>[a-z]+)=(?<_VAL_1>[a-z]+)
    * When using either of the above formats, in a search-time extraction, the
      regex will continue to match against the source text, extracting as many
      fields as can be identified in the source text.

I tried to put in the EXTRACT in props.conf, but it gives the same result as putting it in transforms.conf.

I want to achieve to be able to extract each query as a field and also I want to use this regex to split after this line with the query in.

So in the sample I make I want the following result:

Extract the field query from each line

1: SELECT * FROM DUAL;
2: SELECT * FROM DUAL;
3: SELECT * FROM DUAL;

and I want to use MUST_NOT_BREAK_AFTER = regex to make sure the sample is brokken up in 3 events:

Event 1:

\# Time: 120316  9:27:20
\# User@Host: xxx[xxx] @ xxx.xxx.xxx.dk [xxx.xxx.xxx.xxx]
\# Query_time: 6.848644  Lock_time: 0.000022 Rows_sent: 0  Rows_examined: 1
SET timestamp=xxx;
SELECT * FROM DUAL;

Event 2:

\# User@Host: xxx[xxx] @ xxx.xxx.xxx.dk [xxx.xxx.xxx.xxx]
\# Query_time: 12.537060  Lock_time: 0.000035 Rows_sent: 0  Rows_examined: 0
SET timestamp=xxx;
SELECT * FROM DUAL;

Event 3:

\# User@Host: xxx[xxx] @ xxx.xxx.xxx.dk [xxx.xxx.xxx.xxx]
\# Query_time: 16.935841  Lock_time: 0.000046 Rows_sent: 0  Rows_examined: 0
SET timestamp=xxx;
SELECT * FROM DUAL;

kristian_kolb · ‎03-16-2012

see updated answer /k

rex and transforms.conf not giving the same result

Enter the Dashboard Challenge and Watch the .conf24 Global Broadcast!

Join Us at the Builder Bar at .conf24 – Empowering Innovation and Collaboration

Combine Multiline Logs into a Single Event with SOCK - a Guide for Advanced Users