Splunk Search

Extracting country codes from phone numbers

Contributor

Lets see how many of you are up early this Easter Sunday - bonus point on offer 🙂

I have a lookup with a list of phone country codes - copypasta from http://countrycode.org/ - here's an example:

Code, Country
+1, USA
+44, UK
...

My logs are delivered in a nice format, with the phone number parsed as:

Number
+44789xxxxxxx

The problem I'm having is that country codes are of varying length making a "country code" field extraction difficult.

Is there anyway, preferably without regex, I can use the intelligence from the CSV to pull out the country code into a separate field?

Thanks guys!

Tags (3)
0 Karma
1 Solution

SplunkTrust
SplunkTrust

Hi himynamesdave,

This is completly untested, but try something like this:

 YourBaseSearch | eval Code="" | eval Country="" | inputcsv YourCSVfile append=true | eval myCountry=if(match(Number, Code), Country, "no match") ...

This is still untested 😉

UPDATE:

now this is tested and it is working 😉

YourBaseSearchHere | inputcsv YourcsvHere append=t | stats values(Code) AS Code values(Country) AS Country values(Number) AS Number | mvexpand Code | eval Code=substr(Code,2) | eval CountryName=if(match(Number, Code), Country, "No match") | table Number Code CountryName

What is happening here?
First you search your events and append some information from the csv, pass the values of the three fields Number Code and Country to stats, expand the multivalued in Code and remove the + from the Code. At last check if the Code matches any number and return a table of Number, International dialing code and Country name.

Hope this helps to get you started and thx @martin_mueller for pushing me in the right direction...

Cheers, MuS

View solution in original post

Contributor

Hi himynamesdave,

If your end goal is to list the country name fora respective phone number, have you tried lookup with country code fields with "*"

Have lookups like

PhoneNumber Country
+91* India
+1* USA 
.. ..
.. ..

And try , i guess you dont need to extract country code anymore

0 Karma

Contributor

Unfortunately this will not work due to the ambiguous starts to a code (e.g. US +1 vs American Samoa +1684) 😞

0 Karma

Builder

It does work. When I got some time I will post how.

0 Karma

SplunkTrust
SplunkTrust

How about this:

search yielding Number field
| rex field=Number "^(?<Code_4>(?<Code_3>(?<Code_2(?<Code_1>+\d)\d)\d)\d)"
| lookup country_codes Code as code_4 OUTPUT Country as Country_4
| lookup country_codes Code as code_3 OUTPUT Country as Country_3
| lookup country_codes Code as code_2 OUTPUT Country as Country_2
| lookup country_codes Code as code_1 OUTPUT Country as Country_1
| eval Country = coalesce(Country_4, Country_3, Country_2, Country_1)

I've assumed that in case of ambiguous starts to a code (e.g. US +1 vs American Samoa +1684) the longer match should win. I didn't resolve US vs Canada, that's a long list in and of itself.

0 Karma

Contributor

try this...And this is using regex 🙂

.... your search|rex field=Number "+(?<code>(1|44|91|....)"

For this to work you should have a Number field holding all the phone numbers as you have mentioned example in question.

0 Karma

SplunkTrust
SplunkTrust

Hi himynamesdave,

This is completly untested, but try something like this:

 YourBaseSearch | eval Code="" | eval Country="" | inputcsv YourCSVfile append=true | eval myCountry=if(match(Number, Code), Country, "no match") ...

This is still untested 😉

UPDATE:

now this is tested and it is working 😉

YourBaseSearchHere | inputcsv YourcsvHere append=t | stats values(Code) AS Code values(Country) AS Country values(Number) AS Number | mvexpand Code | eval Code=substr(Code,2) | eval CountryName=if(match(Number, Code), Country, "No match") | table Number Code CountryName

What is happening here?
First you search your events and append some information from the csv, pass the values of the three fields Number Code and Country to stats, expand the multivalued in Code and remove the + from the Code. At last check if the Code matches any number and return a table of Number, International dialing code and Country name.

Hope this helps to get you started and thx @martin_mueller for pushing me in the right direction...

Cheers, MuS

View solution in original post

Builder

I hope I can hang on too this thread, since my problem is equal:

phone.csv

Code, Country
45, Denmark
46, Sweden
47, Norway
48, Poland

Example line list

Jun 17 10:27:34 172.30.112.1 AAA: 172.30.34.58 logged in with username 0045696744444

client_ip = 172.30.34.58 
 client_site = House_of_fun 
 eventtype = Portal_User_logged_in Information  
 module = AAA 
 sourcetype = udp:514 
 tag = Information 
 username = 0045696744444 

So then I tried to modify your line and got this:

 host="172.30.112.1" username="00*" | inputcsv phone.csv append=t | stats values(Code) AS Code values(Country) AS Country values(username) AS username | mvexpand Code | eval Code=substr(Code,3) | eval CountryName=if(match(username, Code), Country, "No match") | table username Code CountryName

or this:

host="172.30.112.1" username="00*" | inputcsv C:\Program Files\Splunk\etc\apps\search\lookups\phone.csv append=t | stats values(Code) AS Code values(Country) AS Country values(username) AS username | mvexpand Code | eval Code=substr(Code,3) | eval CountryName=if(match(username, Code), Country, "No match") | table username Code CountryName

Can you see what is wrong? Why it just give list with all numbers, and on first line I get under Country name "No match"?

PS, This: "eval Code=substr(Code,3)" should remove the two "00"? (starting from third character)
Also tried do add "00" to the csv file, and also with and without the "substr" code

0 Karma

New Member

yoursearch|eval Onecode=substr(Number,1,2)| eval Twocode=substr(Number,1,3) |eval Threecode=substr(Number,1,4)| lookup dial_codeCSV Code as Onecode Outputnew Country| lookup dial_codeCSV Code as Twocode Outputnew Country|lookup dial_codeCSV Code as Threecode Outputnew Country| table Number,Twocode,Threecode,Country

0 Karma

SplunkTrust
SplunkTrust

Try this:

host="172.30.112.1" username="00*"
| inputcsv phone.csv append=t
| stats values(Code) AS Code values(Country) AS Country values(username) AS username
| eval username=substr(username,3)
| eval CountryName=if(match(username, Code), Country, "No match")
| table username Code CountryName

in your use case the username is the phone number so you have to use this in the eval and compare it with the Code

0 Karma

SplunkTrust
SplunkTrust

That should leave you with one row per country code...?

0 Karma

SplunkTrust
SplunkTrust

update ping

0 Karma

SplunkTrust
SplunkTrust

your right, as I said it is untested 😉
but see my update .... this could work, maybe ... still untested ...

0 Karma

SplunkTrust
SplunkTrust

I don't think that'll work - how does the Code and Country end up in the event with the Number after the appended inputcsv?

0 Karma