Splunk Search

extracting country codes that has no fix length

adomila
Explorer

Hi,
I would just like to ask, as to how I could extract country codes within series of numerical values with no fix length? The country code is within a field with starting 001001(prefix fixed length - 6 digits) then followed by the country code but without fixed length, then lastly followed by the MIN(mobile identification number) also not fixed in length. I just need the country codes inside but I'm out of wits on how to go about it, if the country code and MIN are not fixed in length. BTW, I have a lookup table but the country code is not fixed in length in the lookup table as well and I have tried to prefix a couple of zeros in the lookup table but it is not feasible because the actual data does not have leading zeros. Here are a couple of sample data:

tel:001001323353

tel:001001974555

tel:00100196659261

tel:001001966505998

tel:001001966015201

tel:001001338141015

tel:001001955009976

tel:001001965601621

tel:0010013203532

tel:00100163170000

tel:0010014647016

tel:00100197551559

tel:001001333532000

tel:0010013033532090

tel:001001323532000

Tags (3)

landen99
Motivator

There are only two good approaches to this issue:

  • (Always Preferred) Configure the logging application to separate the fields with a delimiter character. - OR -
  • Extract the country code and MIN together and then search that field for the country codes of interest.

| regex field1="1\d+"

0 Karma

sideview
SplunkTrust
SplunkTrust

Properly parsing arbitrary numbers to correctly determine what country code is in there, if any, is a very tricky matter. Unless you're dealing with a small subset of country codes, and unless you're fine with a lot of false positives from things like incomplete numbers, you can't really do it with regex.

http://en.wikipedia.org/wiki/List_of_country_calling_codes

Our most recent Splunk for Cisco CDR app packages some code licensed and ported from Android that actually parses out the numbers as well as local area codes. From this it infers geographical regions. Within the US it also parses exchanges and gets zipcodes which it then uses to get an approximate city and state. I've been thinking of releasing the package as its own commercial Splunk app for use with anyone who wanted just that one feature.

What particular call system are you using here? It's also quite possible that Sideview could create an app for that system and package these same features for that system.

btw the code we use is a python port of Android's libphonenumber
( http://code.google.com/p/libphonenumber/ )

adomila
Explorer

WOW! I'm a big fan of sideview as well. It would be great if you could create and app for this. With regards to your question, I'm not quite sure what call system this is but the logs are definitely CDRs. Looking forward to the app for this. Thanks in advance.

0 Karma

Ayn
Legend

If there is no way of determining where the country code ends, you'd have to provide a list of all unique country codes that should be possible to match. Like

001001(1|21|33|35|47|46)

and so on.

0 Karma

adomila
Explorer

Hi Linu1988,
we don't have anything except for those tel:\d+ all the country codes(w/o fixed length) including the mobile number. these are cdr log/events as mentioned by sideview. regex won't help, as mentioned also by sideview.

0 Karma

linu1988
Champion

what do we have except tel:001001323353? are those the original events from splunk? regex does find match using

 tel:\d+
0 Karma

adomila
Explorer

Hi Ayn,
I have 845 of those values. When I tried to hardcode it, it was so slow. In fact, it has not completed/finished running as of this writing. Can you provide specific/actual scripts? Btw, I came up with this index=xxx tel:001001323 OR tel:00100197 OR tel:0010019665 and so on and so forth...

0 Karma

Ayn
Legend

No problem. Could you please mark my answer as accepted? Thanks!

0 Karma

adomila
Explorer

WOW! That was fast and accurate. Thanks a bunch Ayn.

0 Karma
Get Updates on the Splunk Community!

Splunk Enterprise Security 8.0.2 Availability: On cloud and On-premise!

A few months ago, we released Splunk Enterprise Security 8.0 for our cloud customers. Today, we are excited to ...

Logs to Metrics

Logs and Metrics Logs are generally unstructured text or structured events emitted by applications and written ...

Developer Spotlight with Paul Stout

Welcome to our very first developer spotlight release series where we'll feature some awesome Splunk ...