I have a slow query log from Mysql that among other things has the query statement in an extracted field.
For example the query might be:
SELECT * FROM SAMPLE WHERE USER_ID = 'John Doe' and COMPANY = 'Widget Inc.' and UID = 9999
I'd like to be able to group the sql statements to see how often they occur.
I'd like to get a field that excludes all digits and text between the quotes.
For the same example, I'd like the field to look like this:
SELECT * FROM SAMPLE WHERE USER_ID = '' and COMPANY = '' and UID = 
Thanks
I should have updated this a while ago.
I was able to do this simply with a field extraction and by adding a field with an eval statement in the query. 
A different thread suggested this stanza in /opt/splunk/etc/system/local/props.conf
[slow_query]
LINE_BREAKER = (?:;|Argument)(\n)(?:\# Time: |\# User@Host: )
SHOULD_LINEMERGE = false
TRUNCATE = 0
I added one field extraction for my slow_query source type to get all of the relevant fields:
[^\[\n]*\[(?P<DB_User>[^\]]+)[^\[\n]*\[(?P<DB_Host>[^\]]+).*\n# Query_time: (?P<Query_time>\d*\.\d*)  Lock_time: (?P<Lock_time>\d*.\d*) Rows_sent: (?P<Rows_sent>\d*)  Rows_examined: (?P<Rows_examined>\d*).*SET timestamp\=\d+\;(?P<SQL_STATEMENT>.*)\;
In the search bar I used this:
sourcetype="slow_query"| eval Norm_sql=replace(SQL_STATEMENT, "(\d+|\'.*?\')", "XXXX")
This chart adds up all of the time the database spent on each query in the slow log and ranks them. If your database server can handle the load, you could set the slow query time down to 0 to see everything, but it will definitely hurt performance.
sourcetype="slow_query"| eval Norm_sql=replace(SQL_STATEMENT, "(\d+|\'.*?\')", "XXXX")  | stats sum(Query_time) by Norm_sql | sort sum(Query_time) desc
I should have updated this a while ago.
I was able to do this simply with a field extraction and by adding a field with an eval statement in the query. 
A different thread suggested this stanza in /opt/splunk/etc/system/local/props.conf
[slow_query]
LINE_BREAKER = (?:;|Argument)(\n)(?:\# Time: |\# User@Host: )
SHOULD_LINEMERGE = false
TRUNCATE = 0
I added one field extraction for my slow_query source type to get all of the relevant fields:
[^\[\n]*\[(?P<DB_User>[^\]]+)[^\[\n]*\[(?P<DB_Host>[^\]]+).*\n# Query_time: (?P<Query_time>\d*\.\d*)  Lock_time: (?P<Lock_time>\d*.\d*) Rows_sent: (?P<Rows_sent>\d*)  Rows_examined: (?P<Rows_examined>\d*).*SET timestamp\=\d+\;(?P<SQL_STATEMENT>.*)\;
In the search bar I used this:
sourcetype="slow_query"| eval Norm_sql=replace(SQL_STATEMENT, "(\d+|\'.*?\')", "XXXX")
This chart adds up all of the time the database spent on each query in the slow log and ranks them. If your database server can handle the load, you could set the slow query time down to 0 to see everything, but it will definitely hurt performance.
sourcetype="slow_query"| eval Norm_sql=replace(SQL_STATEMENT, "(\d+|\'.*?\')", "XXXX")  | stats sum(Query_time) by Norm_sql | sort sum(Query_time) desc
 
					
				
		
 
		
		
		
		
		
	
			
		
		
			
					
		In the past, I have used sed expressions to normalize SQL statements.  It gets kinda iffy depending on the complexity of your SQLs, but it is workable.  Something like this:
| rex mode=sed "s/=\s+'[^']+'/= ?/g" | rex mode=sed "s/=\s+[0-9]+/= ?/g"
It becomes a game of figuring out what patterns you want to replace with "?" from the original SQL.
dwaddle is right on this one. I find Kodos - http://kodos.sourceforge.net/ very helpful for figuring out complex regular expressions.
 
					
				
		
 
		
		
		
		
		
	
			
		
		
			
					
		Well, it's possible I got the regex wrong.  Or, it's equally possible that it's not in the right place in your search string.  You need to know a little about how to apply regular expressions and sed transformation rules.
That looks like it would do exactly what I want, but I don't know what to do with that. I tried that at the end of my search string, but it didn't do anything
Hi,
Well I think the way to do this is to look at your 3 objects (USER_ID,COMPANY,UID) and then break them out into fields something like this (assuming they are not already fields pulled out and Splunk doesn't already recognize them)
source=<mysqlquerylow> "SELECT * FROM SAMPLE WHERE" | regex (?<usrid>" USER_ID = '(\w+\s\w+)') | regex (?<cmpy>" and COMPANY = '(\w+\s\w+\D)') | regex  (?<uid>" and UID = (\d+)" | eval yourfield="SELECT
* FROM SAMPLE WHERE "+ursid+cmpy+uid | stats count(newfield)
The benefit of this is you can report on the fields you extracted and count them individually (i.e. how often is UID 9999 coming up)
It could be any sql statment that takes over 3 seconds to run. it includes updates, inserts and selects. It otens involves several forms of Join statements and subqueries.
OK I think I see where you are going but you will still need to break the data out into fields to be able to report on it effectively.
Are these queries in any way similar? Do they start with or end with something definite? Also how are you determining if they are poorly performing is that info in the log as well?
I should have been more clear.  I don't know what those fields are. 
There are hundreds of different queries with different where clauses in each one.
Many of the queries are generated when the user makes different selections in the software.
My goal is to find poorly performing queries that are frequently used in order to optimize the database to perform better.
