I am failing miserably at what should be a simple task. Reading a .csv file into SAS using PROC IMPORT.
I have followed instructions per the UCLA ATS site, the Little SAS book and other sources.
When I download the cars_novname.csv file from the UCLA Web site, this code works fine:
proc import datafile="\\Mac\Home\Downloads\cars_novname.csv"
out=mydata dbms=csv replace;
getnames=no;
run;
When I use the same exact code, with my own data (Below), SAS reads in 1 observation and 6 variables. Given the data, it should be 4 observations and 2 variables. My .csv file was created in Excel and saved as a .csv. It looks identical to the data in the UCLA file when opened w/ a text editor. There is a comma separating each value and a carriage return after each line of data.
proc import datafile="\\Mac\Home\Desktop\Backup Desktop/num_char_test.csv"
out=mydata_2 dbms=csv replace;
getnames=no;
run;
What am I doing wrong here? Thanks for your help.
Here is the data:
Log:
1 rows created in WORK.MYDATA_2 from \\Mac\Home\Desktop\Backup Desktop/num_char_test.csv.
NOTE: WORK.MYDATA_2 data set was successfully created.
NOTE: The data set WORK.MYDATA_2 has 1 observations and 6 variables.
NOTE: PROCEDURE IMPORT used (Total process time):
real time 0.17 seconds
cpu time 0.07 seconds
Quotes are only needed if the value contains the delimiter or quotes. None of your values do and so the quotes are not needed.
PROC IMPORT from a CSV file has to guess at what is in the data as there is no metadata to tell it. The only metadata a CSV file has is the column (variable) names. You can probably do a much better job at guessing what data types to use than PROC import can. You can definitely do a better job of creating meaningul variable names. Plus the code to read the file with a data step is shorter than the code to call PROC IMPORT. For example let's dump out the values of SASHELP.CARS and read it back in.
data _null_;
file 'cars_noheader.csv' dsd ;
set sashelp.cars;
put (_all_) (+0);
run;
To read it back in with a data step you just need to define the length and any required INFORMAT and/or desired FORMATs. Then the INPUT statement is trivial.
data cars;
infile 'cars_noheader.csv' dsd truncover ;
length make $13 model $40 type $8 origin $6 drivetrain $5 msrp
invoice enginesize cylinders horsepower mpg_city mpg_highway
weight wheelbase length 8
;
informat msrp invoice comma8. ;
input make -- length ;
run;
Excel for MAC is one of the programs that didn't get the message that the Mac OS is now a flavor of Unix and that it should be using LineFeed ('0A'x) to mark the end of the line instead of CarriageReturn ('0D'X). Check if there is some other output tyep you can save to that will mark the end of lines properly. Or perhaps upgrade the version on Excel?
To see what is happening try running this little SAS program.
filename csv temp;
data _null_;
file csv recfm=n;
put '123,TOM' '0D'x '234,BOB' '0D'x 'NULL,999' '0D'x '222,Fred' '0D'x;
run;
data _null_;
infile csv ;
input;
list;
run;
data _null_;
infile csv termstr=cr;
input;
list;
run;
The data you posted was all on one line, which matches what your SAS log says. If you want SAS to treat it as four lines you need to have line breaks between the lines. Where did you create the file? Perhaps it is using a non-standard character to indicate the line breaks? In the old days Mac's used to use CR ('0D'x) as the end of line characters.
First try looking at what is in the file.
data _null_;
infile "\\Mac\Home\Downloads\cars_novname.csv" ;
input;
list;
run;
If you do have '0D'x as the line break then you can set that on the INFILE statement. Note that there is not much value in using PROC IMPORT to read a file without column headers, especially one with only two columns.
data mydata;
infile "\\Mac\Home\Downloads\cars_novname.csv" dsd termstr=cr truncover;
input num name $ ;
run;
<The data you posted was all on one line, which matches what your SAS log says.>
When I copy and paste the data from TextEdit into a Word file, or an email message, it is in colums and rows (See screen shot, lower half). When I paste in into this message it is on one line.
<Where did you create the file?>
In Excel for Mac. There is no visible line break in the file (e.g. '0D'x).
Is there something wrong with the way I am creating the .csv file from Excel? I simply typed the values into cells and then saved the file as a .csv file.
When I look more closely at the UCLA data, the non-standard data values are enclosed in quotation marks (See screen shot, upper half).
SAS is not recognizing the end of a row/observation. Do I need to tell it explicitly? With a DATA step I use the TERMSTR=CR option, which works.
<Note that there is not much value in using PROC IMPORT to read a file without column headers, especially one with only two columns.>
I'm just playing around at this point. Trying to understand how SAS assigns attributes to data during PROC IMPORT. My real data has more variables and observations. I'm not exactly clear on why PROC IMPORT wouldn't be valuable w/o column headers. Could you elaborate?
Thanks very much for your help!
Quotes are only needed if the value contains the delimiter or quotes. None of your values do and so the quotes are not needed.
PROC IMPORT from a CSV file has to guess at what is in the data as there is no metadata to tell it. The only metadata a CSV file has is the column (variable) names. You can probably do a much better job at guessing what data types to use than PROC import can. You can definitely do a better job of creating meaningul variable names. Plus the code to read the file with a data step is shorter than the code to call PROC IMPORT. For example let's dump out the values of SASHELP.CARS and read it back in.
data _null_;
file 'cars_noheader.csv' dsd ;
set sashelp.cars;
put (_all_) (+0);
run;
To read it back in with a data step you just need to define the length and any required INFORMAT and/or desired FORMATs. Then the INPUT statement is trivial.
data cars;
infile 'cars_noheader.csv' dsd truncover ;
length make $13 model $40 type $8 origin $6 drivetrain $5 msrp
invoice enginesize cylinders horsepower mpg_city mpg_highway
weight wheelbase length 8
;
informat msrp invoice comma8. ;
input make -- length ;
run;
Excel for MAC is one of the programs that didn't get the message that the Mac OS is now a flavor of Unix and that it should be using LineFeed ('0A'x) to mark the end of the line instead of CarriageReturn ('0D'X). Check if there is some other output tyep you can save to that will mark the end of lines properly. Or perhaps upgrade the version on Excel?
To see what is happening try running this little SAS program.
filename csv temp;
data _null_;
file csv recfm=n;
put '123,TOM' '0D'x '234,BOB' '0D'x 'NULL,999' '0D'x '222,Fred' '0D'x;
run;
data _null_;
infile csv ;
input;
list;
run;
data _null_;
infile csv termstr=cr;
input;
list;
run;
I just tested this using Excel15 on my Mac and could not replicate your issue. My data imported correctly using Proc Import.
Please consider attaching your CSV file. Change the extension to txt to upload it to the forum.
Perhaps the issue is w/ the different versions of Excel. I'm running Excel for Mac 2011. The same code also ran w/o error when it was converted from .xlsx to .csv on Windows...
@Tom This is very helpful. Thank you!
I'm using SAS University Edition and getting an error w/ your code. ERROR: Insufficient authorization to access /opt/sasinside/SASConfig/Lev1/SASApp/cars_noheader.csv.
Regardless, I understand what you are trying to convey. This whole exercise was designed to understand what SAS does in PROC IMPORT. How it interprets data and assigns attributes. That's why I put a character string in the "numeric" variable and a number (i.e. 999) in the "character" variable. I wanted to see what SAS would do.
I understand that using a DATA step gives you much more control when reading in your data. The one drawback, it would seem, is when you have a large number of variables. I have a data set w/ 1,600 variables!
How do you assign informats and formats to multiple variables simultaneously? In your example, it looks like msrp and invoice will both be assigned the comma8. informat. What if there were other variables and other informats. Would you just follow the same convention (i.e. List variables individually followed by the informat). What about using lists of variables?
Also, you've added in variable names, but specified getnames=no. You need to add the DATAROW option to tell SAS that the first observation of actual data starts in row 2.
No, but FIRSTOBS is used in a datastep and DATAROW is used in PROC IMPORT.
No.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.