BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
_maldini_
Barite | Level 11

I am failing miserably at what should be a simple task. Reading a .csv file into SAS using PROC IMPORT.

 

I have followed instructions per the UCLA ATS site, the Little SAS book and other sources.

 

When I download the cars_novname.csv file from the UCLA Web site, this code works fine:

 

proc import datafile="\\Mac\Home\Downloads\cars_novname.csv" 
out=mydata dbms=csv replace;
getnames=no;
run;

 

When I use the same exact code, with my own data (Below), SAS reads in 1 observation and 6 variables. Given the data, it should be 4 observations and 2 variables. My .csv file was created in Excel and saved as a .csv. It looks identical to the data in the UCLA file when opened w/ a text editor. There is a comma separating each value and a carriage return after each line of data.

 

proc import datafile="\\Mac\Home\Desktop\Backup Desktop/num_char_test.csv" 
out=mydata_2 dbms=csv replace;
getnames=no;
run;

 

What am I doing wrong here? Thanks for your help.

 

Here is the data:

  • numeric,character 123,TOM 234,BOB NULL,999 222,Fred
  • 2 variables (numeric, character) and 8 observations.

Log:

1 rows created in WORK.MYDATA_2 from \\Mac\Home\Desktop\Backup Desktop/num_char_test.csv.

 

NOTE: WORK.MYDATA_2 data set was successfully created.

NOTE: The data set WORK.MYDATA_2 has 1 observations and 6 variables.

NOTE: PROCEDURE IMPORT used (Total process time):

      real time           0.17 seconds

      cpu time            0.07 seconds

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

Quotes are only needed if the value contains the delimiter or quotes.  None of your values do and so the quotes are not needed.

 

PROC IMPORT from a CSV file has to guess at what is in the data as there is no metadata to tell it. The only metadata a CSV file has is the column (variable) names.  You can probably do a much better job at guessing what data types to use than PROC import can. You can definitely do a better job of creating meaningul variable names. Plus the code to read the file with a data step is shorter than the code to call PROC IMPORT.  For example let's dump out the values of SASHELP.CARS and read it back in.

 

data _null_;
  file 'cars_noheader.csv' dsd ;
  set sashelp.cars;
  put (_all_) (+0);
run;

To read it back in with a data step you just need to define the length and any required INFORMAT and/or desired FORMATs. Then the INPUT statement is trivial.

data cars;
  infile 'cars_noheader.csv' dsd truncover ;
  length make $13 model $40 type $8 origin $6 drivetrain $5 msrp
         invoice enginesize cylinders horsepower mpg_city mpg_highway
         weight wheelbase length 8
  ;
  informat msrp invoice comma8. ;
  input make -- length ;
run;

 

Excel for MAC is one of the programs that didn't get the message that the Mac OS is now a flavor of Unix and that it should be using LineFeed ('0A'x) to mark the end of the line instead of CarriageReturn ('0D'X).  Check if there is some other output tyep you can save to that will mark the end of lines properly.  Or perhaps upgrade the version on Excel?

 

To see what is happening try running this little SAS program.

 

filename csv temp;
data _null_;
  file csv recfm=n;
  put '123,TOM' '0D'x '234,BOB' '0D'x 'NULL,999' '0D'x '222,Fred' '0D'x;
run;
data _null_;
  infile csv ;
  input;
  list;
run;
data _null_;
  infile csv termstr=cr;
  input;
  list;
run;

 

 

View solution in original post

14 REPLIES 14
Tom
Super User Tom
Super User

The data you posted was all on one line, which matches what your SAS log says.  If you want SAS to treat it as four lines you need to have line breaks between the lines.  Where did you create the file?  Perhaps it is using a non-standard character to indicate the line breaks?  In the old days Mac's used to use CR  ('0D'x) as the end of line characters.

First try looking at what is in the file.

data _null_;
  infile "\\Mac\Home\Downloads\cars_novname.csv" ;
  input;
  list;
run;

If you do have '0D'x as the line break then you can set that on the INFILE statement.  Note that there is not much value in using PROC IMPORT to read a file without column headers, especially one with only two columns.

data mydata;
  infile "\\Mac\Home\Downloads\cars_novname.csv" dsd termstr=cr truncover;
  input num name $ ;
run;
_maldini_
Barite | Level 11

@Tom

 

<The data you posted was all on one line, which matches what your SAS log says.>

 

When I copy and paste the data from TextEdit into a Word file, or an email message, it is in colums and rows (See screen shot, lower half). When I paste in into this message it is on one line.

 

num_char.jpg

 

<Where did you create the file?>

 

In Excel for Mac. There is no visible line break in the file (e.g.  '0D'x). 

 

Is there something wrong with the way I am creating the .csv file from Excel? I simply typed the values into cells and then saved the file as a .csv file.

 

When I look more closely at the UCLA data, the non-standard data values are enclosed in quotation marks (See screen shot, upper half). 

 

SAS is not recognizing the end of a row/observation. Do I need to tell it explicitly? With a DATA step I use the TERMSTR=CR option, which works. 

 

<Note that there is not much value in using PROC IMPORT to read a file without column headers, especially one with only two columns.>

 

I'm just playing around at this point. Trying to understand how SAS assigns attributes to data during PROC IMPORT. My real data has more variables and observations. I'm not exactly clear on why PROC IMPORT wouldn't be valuable w/o column headers. Could you elaborate?

 

Thanks very much for your help!

Tom
Super User Tom
Super User

Quotes are only needed if the value contains the delimiter or quotes.  None of your values do and so the quotes are not needed.

 

PROC IMPORT from a CSV file has to guess at what is in the data as there is no metadata to tell it. The only metadata a CSV file has is the column (variable) names.  You can probably do a much better job at guessing what data types to use than PROC import can. You can definitely do a better job of creating meaningul variable names. Plus the code to read the file with a data step is shorter than the code to call PROC IMPORT.  For example let's dump out the values of SASHELP.CARS and read it back in.

 

data _null_;
  file 'cars_noheader.csv' dsd ;
  set sashelp.cars;
  put (_all_) (+0);
run;

To read it back in with a data step you just need to define the length and any required INFORMAT and/or desired FORMATs. Then the INPUT statement is trivial.

data cars;
  infile 'cars_noheader.csv' dsd truncover ;
  length make $13 model $40 type $8 origin $6 drivetrain $5 msrp
         invoice enginesize cylinders horsepower mpg_city mpg_highway
         weight wheelbase length 8
  ;
  informat msrp invoice comma8. ;
  input make -- length ;
run;

 

Excel for MAC is one of the programs that didn't get the message that the Mac OS is now a flavor of Unix and that it should be using LineFeed ('0A'x) to mark the end of the line instead of CarriageReturn ('0D'X).  Check if there is some other output tyep you can save to that will mark the end of lines properly.  Or perhaps upgrade the version on Excel?

 

To see what is happening try running this little SAS program.

 

filename csv temp;
data _null_;
  file csv recfm=n;
  put '123,TOM' '0D'x '234,BOB' '0D'x 'NULL,999' '0D'x '222,Fred' '0D'x;
run;
data _null_;
  infile csv ;
  input;
  list;
run;
data _null_;
  infile csv termstr=cr;
  input;
  list;
run;

 

 

Reeza
Super User

I just tested this using Excel15 on my Mac and could not replicate your issue. My data imported correctly using Proc Import.

Please consider attaching your CSV file. Change the extension to txt to upload it to the forum.

 

 

_maldini_
Barite | Level 11

@Reeza The file is attached. Thanks for your assitance. 

 

I'm using Mac for Excel 2011

_maldini_
Barite | Level 11

Perhaps the issue is w/ the different versions of Excel. I'm running Excel for Mac 2011. The same code also ran w/o error when it was converted from .xlsx to .csv on Windows...

_maldini_
Barite | Level 11

@Tom This is very helpful. Thank you!

 

I'm using SAS University Edition and getting an error w/ your code. ERROR: Insufficient authorization to access /opt/sasinside/SASConfig/Lev1/SASApp/cars_noheader.csv.

 

Regardless, I understand what you are trying to convey. This whole exercise was designed to understand what SAS does in PROC IMPORT. How it interprets data and assigns attributes. That's why I put a character string in the "numeric" variable and a number (i.e. 999) in the "character" variable. I wanted to see what SAS would do. 

 

I understand that using a DATA step gives you much more control when reading in your data. The one drawback, it would seem, is when you have a large number of variables. I have a data set w/ 1,600 variables!

 

How do you assign informats and formats to multiple variables simultaneously? In your example, it looks like msrp and invoice will both be assigned the comma8. informat. What if there were other variables and other informats. Would you just follow the same convention (i.e. List variables individually followed by the informat). What about using lists of variables?

_maldini_
Barite | Level 11
Btw, I had a friend use Excel on Windows to convert the same file to a .csv and it worked w/o error in PROC IMPORT. Crazy.
ballardw
Super User
Note that Excel can export to 3 different CSV formats when doing a File-Save As.
Reeza
Super User

Also, you've added in variable names, but specified getnames=no. You need to add the DATAROW option to tell SAS that the first observation of actual data starts in row 2. 

_maldini_
Barite | Level 11
Does the DATAROW option have a different effect than the FIRSTOBS=2 option in this situation?
Reeza
Super User

No, but FIRSTOBS is used in a datastep and DATAROW is used in PROC IMPORT.

_maldini_
Barite | Level 11

@Tom @Reeza Can you use PROC IMPORT and assign attributes for some of the variables, but let SAS guess at the others? Like a hybrid of a DATA step w/ LENGTH, INFORMAT, FORMAT, etc. and PROC IMPORT.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 14 replies
  • 217057 views
  • 8 likes
  • 4 in conversation