Hi all,
I'm working on a dashboard query that preprocesses data for a | geostats command. The end goal is to pipe data between two different but similar applications. The value used to populate the field 'country' has different format between the two applications, and if it's possible to create a conditional statement that uses different values from a lookup table (or even separate lookups) in line.
Caveat: I know it's possible to build a different search and used a token with the | savedsearch command to pass the correct search based on the token. I'm trying to understand if it can be done inline in my base search.
The details:
application A's field associated with country uses the ISO numeric country code. The search then uses a lookup table like so:
| lookup [lookup].csv NumericCountryCode as [field in search] output AlphaCountryCode
This takes the numeric code and produces a 2 character ISO like 'US' or 'BR' and so on. The format of the lookup table is:
AlphaCountryCode CountryCode NumericCountryCode
In application B, country data comes in uppercase ISO alphabetical format, but it can be represented as either the Alpha-2 or alpha-3 (3 digit) country code. In the majority of cases, this could be solved with a substring call that reduces the length to 2. However, there are fringe cases where this approach does not work, and must be accounted for.
Example: Angola - Alpha-2 code: AO alpha-3 code: AGO.
My proposed approach is to modify the lookup table like so:
AlphaCountryCode AlphaTwoCountryCode AlphaThreeCountryCode NumericCountryCode
Such that I can access both of these values and return the correct format (AlphaTwoCountryCode). Is it possible to handle this inline with the same lookup table? The pseudo code would be something like this:
if(len(countryCode)==2), lower(countryCode) and return the countryCode. Else if(len(countryCode)==3),
| lookup [lookup].csv AlphaThreeCountryCode as countryCode output AlphaTwoCountryCode
Is this an appropriate approach, and if so, how can I build the syntax of the eval statement to evaluate length and output the appropriate result?
Thanks
Take out the square brackets. That creates a subsearch which executes before everything else, which wouldn't work right.
Another option is to do the lookup then decide afterward which code to use.
| eval MerchantCountry=lower(MerchantCountry) ```evaluate all country codes to lower case```
```then check the length, if it's 3, lookup the country code and output the 2 digit code, else return the original MerchantCountry```
| lookup CountryCodesThreeDigit.csv Alpha3CountryCode AS MerchantCountry output AlphaCountryCode
| eval MerchantCountry=if(len(MerchantCountry)==3, AlphaCountryCode, MerchantCountry)
@richgalloway Hi Rich, this is what I've come up with. I'm trying to determine whether I have some silly syntactical errors or is this operation is not supported in Splunk. My intuition is that it is not supported:
| eval MerchantCountry=lower(MerchantCountry) ```evaluate all country codes to lower case```
```then check the length, if it's 3, lookup the country code and output the 2 digit code, else return the original MerchantCountry```
| eval MerchantCountry=if(len(MerchantCountry)==3,
[| lookup CountryCodesThreeDigit.csv Alpha3CountryCode AS MerchantCountry output AlphaCountryCode], MerchantCountry)
Take out the square brackets. That creates a subsearch which executes before everything else, which wouldn't work right.
Another option is to do the lookup then decide afterward which code to use.
| eval MerchantCountry=lower(MerchantCountry) ```evaluate all country codes to lower case```
```then check the length, if it's 3, lookup the country code and output the 2 digit code, else return the original MerchantCountry```
| lookup CountryCodesThreeDigit.csv Alpha3CountryCode AS MerchantCountry output AlphaCountryCode
| eval MerchantCountry=if(len(MerchantCountry)==3, AlphaCountryCode, MerchantCountry)
Update for anyone facing a similar issue:
I was unable to get the lookup working within the if statement. Ending up having some erroneous values in the field that had to be preprocessed as well. Once that was accounted for, I ended up using Rich's answer:
| eval MerchantCountry=lower(MerchantCountry)
| rex field=MerchantCountry mode=sed "s/^\s+//g" ```remove leading spaces```
| eval tempVar = "us"
| eval MerchantCountry=if(match(tempVar, ".*"), substr(MerchantCountry, 1, 2), MerchantCountry) ```handle strange characters being added to the end of US```
| lookup CountryCodesThreeDigit.csv Alpha3CountryCode AS MerchantCountry output AlphaCountryCode
| eval MerchantCountry=if(len(MerchantCountry)==3, AlphaCountryCode, MerchantCountry)
Just do both lookups. There's no overlap between the two so only one of them will return a result so no worry about conflicting results. Unless you're doing thousands of lookups the impact of the extra lookups should be small.
@richgalloway , thanks for the response. After some more research and your reply, I think separate lookups is the way to go.
I do think I wasn't clear in the second part of my question. Application B's countryCode field contains both the 2 digit and 3 digit codes in the data. (I cannot change this) As such, I need to account for this inline as well.
Using the new lookup table of the following format:
AlphaCountryCode CountryCode NumericCountryCode Alpha3CountryCode
Would it be possible to use an eval statement to evaluate the length of the incoming data field and return the correct 2 digit countryCode from the lookup inline?
example: | eval countryCode=if(len(countryCode)==3), [use lookup to return the 2 digit code] else if(len(countryCode==2)) [return the countryCode, as it's already in 2 digit format]
I've not tried using the lookup command within an eval, but it might work. Give it a try. If it doesn't work, do the lookup separately then use if to choose which countryCode field to use.
Thanks,
I'll work with it and see if I can come up with a response.