Solved: Re: Grouping Wildcarded Ranges Together in Chart R...

RMartinezDTV · ‎11-01-2013

Hi, I feel like this is a deceptively simple question, but I'm fairly new to Splunk.

I want to find the avg transaction times by country.
The country is stored as "country=CO3" where CO can be any two-letter code and the numbers can be 1 digit, 2 digit, or 0 digit (in which case it's COL for Colombia).

When I run the following query: sourcetype="xxx" (country="CO*" OR country="VE*" OR country="PE*") | transaction transID | chart avg(duration) by country

I get exactly the results I want except each version of COx is treated as a unique country (how I expect Splunk would work). I'd like to collapse the results into just the countries themselves - CO1, CO2, CO13, and COL would all show together as Colombia (naming not important).

I've tried various eval statements (including the examples for earthquake magnitudes here: http://docs.splunk.com/Documentation/Splunk/6.0/SearchReference/eval) but can't seem to get it working with strings.

Can someone explain how to collapse the string results into countries?
Thanks.

somesoni2 · ‎11-01-2013

Please try the following

sourcetype="xxx" (country="CO*" OR country="VE*" OR country="PE*") | eval country=replace(country,"\d","") | transaction transID | chart avg(duration) by country

Using this you will get country as just characters (digits removed) and it will group all CO* as one events.
Hope this helps.

----------------------------Update--------------------------

This should be perfect for you now 🙂

sourcetype="xxx" (country="CO*" OR country="VE*" OR country="PE*") | eval extraChar=substr(country,3)| eval country=replace(country,extraChar,"") | transaction transID | chart avg(duration) by country

View solution in original post

tpflicke · ‎11-01-2013

There are plenty of ways to go about it, rex is one way but then I am partial to regular expressions.

sourcetype="xxx" (country="CO*" OR country="VE*" OR country="PE*") 
| rex field=country "^(?<country_code>[A-Z]*)"
|  ...

The rex clause will extract the 2 digit country code into a new field called country_code.

BTW, I am not sure if you actually want to use the transaction command in your query - transaction will merge all log entries with the same transID into one big entry and I am not sure what avg will actually be returning in this case as there are multiple records which might have multiple duration fields.
One thing is for sure - transaction makes queries really slow when there's a fair amount of data involved so if you simply need the average duration per log entry and split by country code you wouldn't need it.

somesoni2 · ‎11-01-2013

Please try the following

sourcetype="xxx" (country="CO*" OR country="VE*" OR country="PE*") | eval country=replace(country,"\d","") | transaction transID | chart avg(duration) by country

Using this you will get country as just characters (digits removed) and it will group all CO* as one events.
Hope this helps.

----------------------------Update--------------------------

This should be perfect for you now 🙂

sourcetype="xxx" (country="CO*" OR country="VE*" OR country="PE*") | eval extraChar=substr(country,3)| eval country=replace(country,extraChar,"") | transaction transID | chart avg(duration) by country

somesoni2 · ‎11-01-2013

Please see the updated answer if that suits your requirement.

RMartinezDTV · ‎11-01-2013

That's definitely an improvement over the case statement I was attempting. One further question if you can help....The "first" item isn't written as CO1 but rather COL. So we have COL, CO2, CO3, etc. Is there a way to get only the first say two characters? That would programatically solve this problem. I see trim functions but my output can be either 3 or 4 (eg CO15) characters.

Grouping Wildcarded Ranges Together in Chart Results

Stay Connected: Your Guide to November Tech Talks, Office Hours, and Webinars!

Transform your security operations with Splunk Enterprise Security

Splunk Admins and App Developers | Earn a $35 gift card!