In each log event, I have 3 fields that keep a record count of the number of rows inserted, updated and deleted. I am able to grab statistics on each of these fields (e.g. average, 2 standard deviations above mean, 2 standard deviations below mean, etc). However, when I display them using stats, I get 1 row with 9 columns. As the stats are essentially repeated 3 times, I would like to change the format to a table with 3 rows (one for inserted, updated and deleted) and 3 columns (avg, -2sd, +2sd). What's the best way to do this?
Typically, if you have one field (e.g. date) over multiple logs, you can use the stats avg(field) by date
to split the data into multiple rows. However, my logs don't fit that format. Here's my search query so far
... | stats avg(InsertRecordCount) as Insert, stdev(InsertRecordCount) as Insert_stdev, avg(UpdateRecordCount) as Update, stdev(UpdateRecordCount) as Update_stdev, avg(DeleteRecordCount) as Delete, stdev(DeleteRecordCount) as Delete_stdev
Here's another way that might work. I just don't know how to pull it off. Since I'm taking the same statistical values(avg, 2-stdev, 2+stdev) from for each field, if somehow (in splunk) I could assign split the 1 log from containing 3 fields insert=123, update=234, delete=012 to 3 logs containing 2 fields (type=(insert|update|delete) number=(123|234|012) then I could then I could generate the stats command the same as above and insert 'by type' at the end.
Thanks in advance!
You're right; it's not pretty. In thinking about this for a while, I came up with a way that made sense to me. I'll share it so people have 2 different ways that they can approach the problem.
In my last comment in the question, I mentioned that if one could split the 3 fields into 2 fields with one field describing the type (insert, update, delete) and the other describing the number of records. Think of this as 'unpivoting' the data. I followed this train of thought and I'll (try to) explain it below
Suppose you have 3 fields all in 1 log event:
Insert=343
Update=512
Delete=729
You can use the eval command to combine all 6 pieces of data into one string: 'Insert,343.Update,512.Delete,729' . Then you can use the split command to split the data into 3 parts (using the period as the delimiter). Next, you can use the mvexpand command to split the 3 parts into 3 different logs. Lastly, you can use the rex command to create 2 new fields, one called type with values Insert, Update, or Delete and one called score with values 343, 512, or 729.
In the end, you have turned the 1 log event into 3 log events each with 2 new fields. Now, you can use the average, standard deviation, and other stats command once using the 'by type' command to create the 3 columns.
Here's the below code that I used. Remember to take out the '<' and replace it with a <
| eval newfield="Insert,"+Insert+".Update,"+Update+".Delete,"+Delete| eval mysplit=split(newfield,".") | mvexpand mysplit | rex field=mysplit "(?'<'type>.*)?\,(?'<'number>.*)" | table mysplit, number
You're right; it's not pretty. In thinking about this for a while, I came up with a way that made sense to me. I'll share it so people have 2 different ways that they can approach the problem.
In my last comment in the question, I mentioned that if one could split the 3 fields into 2 fields with one field describing the type (insert, update, delete) and the other describing the number of records. Think of this as 'unpivoting' the data. I followed this train of thought and I'll (try to) explain it below
Suppose you have 3 fields all in 1 log event:
Insert=343
Update=512
Delete=729
You can use the eval command to combine all 6 pieces of data into one string: 'Insert,343.Update,512.Delete,729' . Then you can use the split command to split the data into 3 parts (using the period as the delimiter). Next, you can use the mvexpand command to split the 3 parts into 3 different logs. Lastly, you can use the rex command to create 2 new fields, one called type with values Insert, Update, or Delete and one called score with values 343, 512, or 729.
In the end, you have turned the 1 log event into 3 log events each with 2 new fields. Now, you can use the average, standard deviation, and other stats command once using the 'by type' command to create the 3 columns.
Here's the below code that I used. Remember to take out the '<' and replace it with a <
| eval newfield="Insert,"+Insert+".Update,"+Update+".Delete,"+Delete| eval mysplit=split(newfield,".") | mvexpand mysplit | rex field=mysplit "(?'<'type>.*)?\,(?'<'number>.*)" | table mysplit, number
This isn't going to be pretty.
I'm assuming you have data something like this:
| stats count | fields - count
| setfields insert_avg = 10, insert_stdev = 2, update_avg = 5, update_stdev = 1, delete_avg = 4, delete_stdev = 3
| foreach *_avg
[ eval <<MATCHSTR>>_lower = <<FIELD>> - 2 * <<MATCHSTR>>_stdev
| eval <<MATCHSTR>>_upper = <<FIELD>> + 2 * <<MATCHSTR>>_stdev
| fields - <<MATCHSTR>>_stdev
]
That'll produce a single line with nine fields for insert, update, and delete and their lower, avg, and upper values.
Using that dummy data, you could do this:
... | eval operation = "insert update delete" | makemv operation | mvexpand operation
| foreach insert_*
[ eval <<MATCHSTR>> = case(operation=="delete", delete_<<MATCHSTR>>,
operation=="update", update_<<MATCHSTR>>,
operation=="insert", insert_<<MATCHSTR>>)
] | table operation lower avg upper
That'll first split the row into three rows, one for each operation. Then it'll pick out the values of the appropriate operation_lower
, operation_avg
, and operation_upper
fields for the operation of that row.
There might be a nicer way, but I can't think of one right now 🙂