Hi, I feel like this is a deceptively simple question, but I'm fairly new to Splunk.
I want to find the avg transaction times by country.
The country is stored as "country=CO3" where CO can be any two-letter code and the numbers can be 1 digit, 2 digit, or 0 digit (in which case it's COL for Colombia).
When I run the following query: sourcetype="xxx" (country="CO*" OR country="VE*" OR country="PE*") | transaction transID | chart avg(duration) by country
I get exactly the results I want except each version of COx is treated as a unique country (how I expect Splunk would work). I'd like to collapse the results into just the countries themselves - CO1, CO2, CO13, and COL would all show together as Colombia (naming not important).
I've tried various eval statements (including the examples for earthquake magnitudes here: http://docs.splunk.com/Documentation/Splunk/6.0/SearchReference/eval) but can't seem to get it working with strings.
Can someone explain how to collapse the string results into countries?
Thanks.
Please try the following
sourcetype="xxx" (country="CO*" OR country="VE*" OR country="PE*") | eval country=replace(country,"\d","") | transaction transID | chart avg(duration) by country
Using this you will get country as just characters (digits removed) and it will group all CO* as one events.
Hope this helps.
----------------------------Update--------------------------
This should be perfect for you now 🙂
sourcetype="xxx" (country="CO*" OR country="VE*" OR country="PE*") | eval extraChar=substr(country,3)| eval country=replace(country,extraChar,"") | transaction transID | chart avg(duration) by country
There are plenty of ways to go about it, rex is one way but then I am partial to regular expressions.
sourcetype="xxx" (country="CO*" OR country="VE*" OR country="PE*")
| rex field=country "^(?<country_code>[A-Z]*)"
| ...
The rex clause will extract the 2 digit country code into a new field called country_code.
BTW, I am not sure if you actually want to use the transaction command in your query - transaction will merge all log entries with the same transID into one big entry and I am not sure what avg will actually be returning in this case as there are multiple records which might have multiple duration fields.
One thing is for sure - transaction makes queries really slow when there's a fair amount of data involved so if you simply need the average duration per log entry and split by country code you wouldn't need it.
Please try the following
sourcetype="xxx" (country="CO*" OR country="VE*" OR country="PE*") | eval country=replace(country,"\d","") | transaction transID | chart avg(duration) by country
Using this you will get country as just characters (digits removed) and it will group all CO* as one events.
Hope this helps.
----------------------------Update--------------------------
This should be perfect for you now 🙂
sourcetype="xxx" (country="CO*" OR country="VE*" OR country="PE*") | eval extraChar=substr(country,3)| eval country=replace(country,extraChar,"") | transaction transID | chart avg(duration) by country
Please see the updated answer if that suits your requirement.
That's definitely an improvement over the case statement I was attempting. One further question if you can help....The "first" item isn't written as CO1 but rather COL. So we have COL, CO2, CO3, etc. Is there a way to get only the first say two characters? That would programatically solve this problem. I see trim functions but my output can be either 3 or 4 (eg CO15) characters.