My data set has 30,000 subjects and 2,400 variables.
When I read the .csv file, it creates the same 2400 columns but some variables are not there. I tried both proc import and point-and-click in SAS-studio, both data sets in SAS have the correct number of rows and columns (when checking the dimension) but both are missing some of the variables (the number of missing variables is more when I used point-and-click)
proc import datafile='\\X\SASdata\dataset.csv'
out=dataset1
dbms=csv
replace;
run;
Your help is much appreciated. Thank you!
Finally, I could not import the .csv file but
LIBNAME mylibname XLSX '\\C:\direction\filemname.xlsx';
worked very well with the correct rows and columns. The hint is that in the data step, I need to specify the Excel sheet that contains the data I want to work on (for me, it was sheet1)
data mydata;
set mylibname.sheet1;
run;
Another point was that I kept the data the same that was read from the .xlsx file. When I converted the file to SAS data, it changed the type of the variables and I did not want that.
Thank you, everyone!
Likely should add: Guessingrows=max; to examine more of the data.
Sparsely occupied variables that aren't populated in the first few rows of data may not get read properly as the proc only uses about 20 rows by default to read.
If that doesn't help there may be issues with content that would require more information to address.
@Emma_at_SAS wrote:
Thank you, ballardw! I added your suggestion to my code, but I think it is taking very long to run the Guessingrows=max for my large data set. may I add 50 instead of max?
There is a performance penalty.
If you need performance then write a data step and do not rely on a guessing procedure.
At a minimum with 30,000 records I would use a minimum of 1000 rows if forced but I would actually use the guessingrow=max and wait.
Or write a data step. If you have a document describing the file you might be able to do that easily. If not the question might be why there are 2400 variables. If someone is doing a typical-trained-on-spreasheets approach of having a separate column for each time period/occasion/person instead of a variable holding time period/occasion/person identification then likely you will want to fix that later (or go back to the source and get a much nicer to work with file structure).
With 2400 columns (who comes up with such a bad table design?) it may well be that PROC IMPORT has troubles processing the header line.
Maybe you find a way to create the data step programmatically from the file documentation.
Thank you, KurBremser
I do not need all the columns for my analyses. So, you think if I remove the columns from the .csv file, it may solve my problem?
Also, if a large data really has so many variables, what is your suggestion to make a better design for data?
Thank you!
@Emma_at_SAS wrote:
Thank you, KurBremser
I do not need all the columns for my analyses. So, you think if I remove the columns from the .csv file, it may solve my problem?
Also, if a large data really has so many variables, what is your suggestion to make a better design for data?
Thank you!
No need to remove the columns from the CSV file (that would just introduce a risk of corrupting the data.)
You can just not keep the variables you don't need. You do need to read each variable up to the last variable on the line you want to keep. But you could just read them all into a dummy variable that you drop before writing the dataset.
Much easier to just write your own DATA step than depend on PROC IMPORT guessing what to do with the file.
Here is an example to read two character fields and three numeric fields, one of which is a date, from a file with at least 20 fields per line. So it skips the 2 and 3 fields (reads them into the DUMMY variable) and also the 13 fields before the date.
data want;
infile 'myfile.csv' dsd truncover firstobs=2 ;
length id $10 age 8 gender $6 weight date 8;
length dummy $1 ;
informat date date.;
format date date9.;
input id 2*dummy age gender weight 13*dummy date;
drop dummy;
run;
Proc Import has trouble when the header row is too long for it. If the average header value is 13 character long for 2,400 variables then the header row will be more than 32K bytes long.
Try running this simple step to see what the beginning of the text file looks like:
data _null_;
infile '\\X\SASdata\dataset.csv' obs=3;
input;
list;
run;
Finally, I could not import the .csv file but
LIBNAME mylibname XLSX '\\C:\direction\filemname.xlsx';
worked very well with the correct rows and columns. The hint is that in the data step, I need to specify the Excel sheet that contains the data I want to work on (for me, it was sheet1)
data mydata;
set mylibname.sheet1;
run;
Another point was that I kept the data the same that was read from the .xlsx file. When I converted the file to SAS data, it changed the type of the variables and I did not want that.
Thank you, everyone!
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.