Hello,
I am brand new to SAS so I have a basic question.
I am practicing reading in CSV files into SAS using an infile statement, and then cleaning that data to make it into a standardized format.
The data I want to read into SAS is a list of phone numbers:
Raw Data Looks Like This:
Phone
(908)232-4856
210.343.4757
(516) 343 - 9293
9342342345
I tried to import using the below program, but my problem is that the phone numbers are getting cut off at the dash (-) or the dot (.) or the blank. How do I read in everything properly? What I am I missing? Any help much appreciated. Thank you!
/*This is how I am reading in the file into SAS*/
data phone;
infile '/sscc/home/m/mkh246/Textbook_Datasets/Listing of Phone.csv' dlm=',' firstobs=2 truncover;
input phone $;
Run;
/*This is my data cleaning step I am practicing*/
data phoneformatted;
length PhoneNumber $10;
set work.phone;
PhoneNumber=compress(Phone,'()-.');
drop Phone;
Run;
You are very close. Try this:
/*This is how I am reading in the file into SAS*/
data phone;
infile '/sscc/home/m/mkh246/Textbook_Datasets/Listing of Phone.csv' dlm=',' firstobs=2 truncover;
length phone $16;
input phone &;
Run;
/*This is my data cleaning step I am practicing*/
data phoneformatted;
length PhoneNumber $10;
set work.phone;
PhoneNumber=compress(Phone, ' ()-.');
drop Phone;
Run;
Don't let SAS guess the length of your input field.The & in the input will allow single spaces in the input field. Add a space in the list of characters to compress.
SAS strings by default are assigned a length of 8, unless you specify a longer length. You'll need to specify a longer length ahead of reading in the variable, an informat is a simple way.
Is your file actually CSV (comma separated)? The way presented it's shown as a single variable and then you may have to change how you read it in.
data phone;
Informat phone $20.;
infile '/sscc/home/m/mkh246/Textbook_Datasets/Listing of Phone.csv' dlm=',' firstobs=2 truncover;
input phone $;
Run;
You are very close. Try this:
/*This is how I am reading in the file into SAS*/
data phone;
infile '/sscc/home/m/mkh246/Textbook_Datasets/Listing of Phone.csv' dlm=',' firstobs=2 truncover;
length phone $16;
input phone &;
Run;
/*This is my data cleaning step I am practicing*/
data phoneformatted;
length PhoneNumber $10;
set work.phone;
PhoneNumber=compress(Phone, ' ()-.');
drop Phone;
Run;
Don't let SAS guess the length of your input field.The & in the input will allow single spaces in the input field. Add a space in the list of characters to compress.
Thank you PG! This worked!
I see how the "&" works, but may I ask where does the $16 come from in the length? I am struggling a little bit wrapping my head around how the length part works exactly. If you could explain how that works that would be great!
Thank you!
The length statement tells SAS what the length of the character variable should be, instead of letting SAS guessing (sometimes wrong) what it should be. The length statement should precede the first mention of the variable in the data step. I used 16 but, of course, you could chose another length.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.