Hello,
I am practicing reading files into SAS, and I need a little bit of clarification about when I would use the different ways of reading data in from external files.
1. What type of files do I use PROC IMPORT for? Do I only use PROC IMPORT for Excel, CSV and SPSS files?
2. When I use an infile statement to read in a file, do I then use the following steps?
DATA XXX;
INFILE'XXXX' DLM='X' firstobs=2 truncover
INFORMAT XXXX anydtdtm30. YYY $14.;
INPUT
FORMAT /*This one is optional though right? For example, if I want to format my fields in a specific way then I would put the format statement here. Like for example for a date field I can set how I want to the date to be displayed here?*/
Run;
3. How are the steps above different than the steps below? The below code works to read in some data, but why do I not need to put in an INFORMAT statement for the below? Why does LENGTH work below?
data phone;
infile '/sscc/home/m/mkh246/Textbook_Datasets/Listing of Phone.csv' dlm=',' firstobs=2 truncover;
length phone $16;
input phone &;
Run;
Any insight on the differences would be so helpful.
Thank you!
1. What type of files do I use PROC IMPORT for? Do I only use PROC IMPORT for Excel, CSV and SPSS files?
Any delimited file of which CSV is only one, tab, pipe (the | character), : and other characters may be used. Import also works for a number of database files depending on licensed modules your SAS install has. For files that do not have inherent data types such as Excel, or other spread sheets, and delimited files the procedure guesses about data types and may get things wrong. For delimited files I mostyly use proc import for the bit of datastep code it will generate to modify as needed, such as setting my variable names, use of custom informats, and making the code general enough to read other files of the same structure for consistent results.
2. When I use an infile statement to read in a file, do I then use the following steps?
Generally though if all of your data is numeric informats will default to a BEST and usually generates acceptable data.
How are the steps above different than the steps below? The below code works to read in some data, but why do I not need to put in an INFORMAT statement for the below? Why does LENGTH work below?
If you run proc contents on the data set generated you will find that an INFORMAT was imputed by SAS from specifying that the variable would be character and the width of the informat was set to the length you specify. Without a character length indication the variable would attempt to read a number, with likely poor results.
Note: if your data exceeded 16 characters it would be truncated, and if the data has spaces imbedded you may only have part of desired value. Example since this looks like phone numbers: 123 456-1234 has 2 spaces after the ")" and the format modifier of & only handles one space.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.