To the human eye these are 979592435350 with optionally different digits before. From a data set with lots of such sequences I want to be able to extract the 979592435350 value, and others like it separate from the varying digits before. Of course, I don't know the 979592435350 value in advance otherwise it'd be easy!
Generally the numbers I want will be 11-12 digits long, but not always, sometimes it'll be shorter, but should never be longer. The digits before will be <=4 digits in length most of the time but I'd prefer to do it without hard-coding the length if possible. The value we want should always be longer than the value that prefixes it though.
Can anyone think of an elegant way of extracting these values?
Ok, i might be getting somewhere with this. mvappend() enables me to export multiple substrings of different length with the same field name. So each record now as a 12 digit value, an 11 digit value and so on, as well as the actual recorded value, all with the same field name.
I can do a top or a dc() on these and group them.
Only problem is now is that having done that, I'll need to group by the longest matching value. For example, let's say the longest matching value is the 11 digit one, I will have exactly the same count for any length shorter (10, 9 and so on). I won't have a high count for the 12 digit value as that has correctly dropped to the bottom because the 1st digit varies. Thus how can I exclude the shorter variants of the same value?
Can anyone see how I can do this, or of course suggest a completely different way to to do?