Dear Team,
I have working on a data which has 1 million observations.
This data contains the phone numbers of our customers.
During the booking process the resources enter the phone numbers of our clients however there are many instances where an incorrect phone number has been entered lets says "9999999999". <this is an india format with 10 digits and no country code>.
I have run proc freq on this data and can see there is approx 19.5% of the data which is incorrect i.e. in the format like "9999999999", "1111111111" etc which are written in varuous ways.
I need to delete this data and only keep the valid phone numbers.
running if delete statement would have done the trick however as i mentioned the ph number is written in various ways (all numeric).
Kindly suggest an easy way if possible. I am working on SAS developer.
Regards, Shivi
Ok, the easy way:
Write down what a valid phone number consists of, for example:
- Comprises of 9 numbers
- All numeric
- Prefixed by 001
Write your datastep logic to match this:
data want; set have; select; case (length(phone)>9) delete; case (compress(phone," ","d")) delete; case (substr(phone,1,3) ne "001") delete; otherwise; end; run;
Hi Shivi,
There is a parallel thread about almost the same topic: https://communities.sas.com/t5/Base-SAS-Programming/Processing-fields-based-on-value-s/m-p/237105#M4.... Part of the recommendations given there may be applicable to your case.
Ok, the easy way:
Write down what a valid phone number consists of, for example:
- Comprises of 9 numbers
- All numeric
- Prefixed by 001
Write your datastep logic to match this:
data want; set have; select; case (length(phone)>9) delete; case (compress(phone," ","d")) delete; case (substr(phone,1,3) ne "001") delete; otherwise; end; run;
Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.
Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.