Hi,
I would just like to ask, as to how I could extract country codes within series of numerical values with no fix length? The country code is within a field with starting 001001(prefix fixed length - 6 digits) then followed by the country code but without fixed length, then lastly followed by the MIN(mobile identification number) also not fixed in length. I just need the country codes inside but I'm out of wits on how to go about it, if the country code and MIN are not fixed in length. BTW, I have a lookup table but the country code is not fixed in length in the lookup table as well and I have tried to prefix a couple of zeros in the lookup table but it is not feasible because the actual data does not have leading zeros. Here are a couple of sample data:
tel:001001323353
tel:001001974555
tel:00100196659261
tel:001001966505998
tel:001001966015201
tel:001001338141015
tel:001001955009976
tel:001001965601621
tel:0010013203532
tel:00100163170000
tel:0010014647016
tel:00100197551559
tel:001001333532000
tel:0010013033532090
tel:001001323532000
There are only two good approaches to this issue:
| regex field1="1\d+"
Properly parsing arbitrary numbers to correctly determine what country code is in there, if any, is a very tricky matter. Unless you're dealing with a small subset of country codes, and unless you're fine with a lot of false positives from things like incomplete numbers, you can't really do it with regex.
http://en.wikipedia.org/wiki/List_of_country_calling_codes
Our most recent Splunk for Cisco CDR app packages some code licensed and ported from Android that actually parses out the numbers as well as local area codes. From this it infers geographical regions. Within the US it also parses exchanges and gets zipcodes which it then uses to get an approximate city and state. I've been thinking of releasing the package as its own commercial Splunk app for use with anyone who wanted just that one feature.
What particular call system are you using here? It's also quite possible that Sideview could create an app for that system and package these same features for that system.
btw the code we use is a python port of Android's libphonenumber
( http://code.google.com/p/libphonenumber/ )
WOW! I'm a big fan of sideview as well. It would be great if you could create and app for this. With regards to your question, I'm not quite sure what call system this is but the logs are definitely CDRs. Looking forward to the app for this. Thanks in advance.
If there is no way of determining where the country code ends, you'd have to provide a list of all unique country codes that should be possible to match. Like
001001(1|21|33|35|47|46)
and so on.
Hi Linu1988,
we don't have anything except for those tel:\d+ all the country codes(w/o fixed length) including the mobile number. these are cdr log/events as mentioned by sideview. regex won't help, as mentioned also by sideview.
what do we have except tel:001001323353? are those the original events from splunk? regex does find match using
tel:\d+
Hi Ayn,
I have 845 of those values. When I tried to hardcode it, it was so slow. In fact, it has not completed/finished running as of this writing. Can you provide specific/actual scripts? Btw, I came up with this index=xxx tel:001001323 OR tel:00100197 OR tel:0010019665 and so on and so forth...
No problem. Could you please mark my answer as accepted? Thanks!
WOW! That was fast and accurate. Thanks a bunch Ayn.