Solved: Importing .csv file to SAS studio

Emma_at_SAS · Posted 11-19-2020 10:19 AM

My data set has 30,000 subjects and 2,400 variables.

When I read the .csv file, it creates the same 2400 columns but some variables are not there. I tried both proc import and point-and-click in SAS-studio, both data sets in SAS have the correct number of rows and columns (when checking the dimension) but both are missing some of the variables (the number of missing variables is more when I used point-and-click)

proc import datafile='\\X\SASdata\dataset.csv'
out=dataset1
dbms=csv

replace;
run;

Your help is much appreciated. Thank you!

Emma_at_SAS · Posted 11-23-2020 12:57 PM

Finally, I could not import the .csv file but

LIBNAME mylibname XLSX '\\C:\direction\filemname.xlsx';

worked very well with the correct rows and columns. The hint is that in the data step, I need to specify the Excel sheet that contains the data I want to work on (for me, it was sheet1)

data mydata;
set mylibname.sheet1;
run;

Another point was that I kept the data the same that was read from the .xlsx file. When I converted the file to SAS data, it changed the type of the variables and I did not want that.

Thank you, everyone!

View solution in original post

ballardw · Posted 11-19-2020 10:56 AM

Likely should add: Guessingrows=max; to examine more of the data.

Sparsely occupied variables that aren't populated in the first few rows of data may not get read properly as the proc only uses about 20 rows by default to read.

If that doesn't help there may be issues with content that would require more information to address.

Emma_at_SAS · Posted 11-19-2020 11:07 AM

Thank you, ballardw! I added your suggestion to my code, but I think it is taking very long to run the Guessingrows=max for my large data set. may I add 50 instead of max?

ballardw · Posted 11-19-2020 11:27 AM

@Emma_at_SAS wrote:
Thank you, ballardw! I added your suggestion to my code, but I think it is taking very long to run the Guessingrows=max for my large data set. may I add 50 instead of max?

There is a performance penalty.

If you need performance then write a data step and do not rely on a guessing procedure.

At a minimum with 30,000 records I would use a minimum of 1000 rows if forced but I would actually use the guessingrow=max and wait.

Or write a data step. If you have a document describing the file you might be able to do that easily. If not the question might be why there are 2400 variables. If someone is doing a typical-trained-on-spreasheets approach of having a separate column for each time period/occasion/person instead of a variable holding time period/occasion/person identification then likely you will want to fix that later (or go back to the source and get a much nicer to work with file structure).

Kurt_Bremser · Posted 11-19-2020 10:57 AM

With 2400 columns (who comes up with such a bad table design?) it may well be that PROC IMPORT has troubles processing the header line.

Maybe you find a way to create the data step programmatically from the file documentation.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

Emma_at_SAS · Posted 11-19-2020 11:03 AM

Thank you, KurBremser

I do not need all the columns for my analyses. So, you think if I remove the columns from the .csv file, it may solve my problem?

Also, if a large data really has so many variables, what is your suggestion to make a better design for data?

Thank you!

Tom · Posted 11-19-2020 12:23 PM

@Emma_at_SAS wrote:

Thank you, KurBremser

I do not need all the columns for my analyses. So, you think if I remove the columns from the .csv file, it may solve my problem?

Also, if a large data really has so many variables, what is your suggestion to make a better design for data?

Thank you!

No need to remove the columns from the CSV file (that would just introduce a risk of corrupting the data.)

You can just not keep the variables you don't need. You do need to read each variable up to the last variable on the line you want to keep. But you could just read them all into a dummy variable that you drop before writing the dataset.

Much easier to just write your own DATA step than depend on PROC IMPORT guessing what to do with the file.

Here is an example to read two character fields and three numeric fields, one of which is a date, from a file with at least 20 fields per line. So it skips the 2 and 3 fields (reads them into the DUMMY variable) and also the 13 fields before the date.

data want;
  infile 'myfile.csv' dsd truncover firstobs=2 ;
  length id $10 age 8 gender $6 weight date 8;
  length dummy $1 ;
  informat date date.;
  format date date9.;
  input id 2*dummy age gender weight 13*dummy  date;
  drop dummy;
run;

Emma_at_SAS · Posted 11-19-2020 12:59 PM

Thank you for reviewing my other thoughts and for your example code!

Tom · Posted 11-19-2020 11:02 AM

Proc Import has trouble when the header row is too long for it. If the average header value is 13 character long for 2,400 variables then the header row will be more than 32K bytes long.

Try running this simple step to see what the beginning of the text file looks like:

data _null_;
  infile '\\X\SASdata\dataset.csv' obs=3;
  input;
  list;
run;

Emma_at_SAS · Posted 11-19-2020 11:10 AM

Thank you, Tom! I think that is the case for my data. The header values are long, on average even more than 13 characters.

Emma_at_SAS · Posted 11-23-2020 12:57 PM

Finally, I could not import the .csv file but

LIBNAME mylibname XLSX '\\C:\direction\filemname.xlsx';

worked very well with the correct rows and columns. The hint is that in the data step, I need to specify the Excel sheet that contains the data I want to work on (for me, it was sheet1)

data mydata;
set mylibname.sheet1;
run;

Another point was that I kept the data the same that was read from the .xlsx file. When I converted the file to SAS data, it changed the type of the variables and I did not want that.

Thank you, everyone!

Importing .csv file to SAS studio

Re: Importing .csv file to SAS studio

Re: Importing .csv file to SAS studio

Re: Importing .csv file to SAS studio

Re: Importing .csv file to SAS studio

Re: Importing .csv file to SAS studio

Re: Importing .csv file to SAS studio

Re: Importing .csv file to SAS studio

Re: Importing .csv file to SAS studio

Re: Importing .csv file to SAS studio

Re: Importing .csv file to SAS studio

Re: Importing .csv file to SAS studio

The 2025 SAS Hackathon has begun!

SAS Training: Just a Click Away