BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Emma_at_SAS
Lapis Lazuli | Level 10

My data set has 30,000 subjects and 2,400 variables.

When I read the .csv file, it creates the same 2400 columns but some variables are not there. I tried both proc import and point-and-click in SAS-studio, both data sets in SAS have the correct number of rows and columns (when checking the dimension) but both are missing some of the variables (the number of missing variables is more when I used point-and-click) 

proc import datafile='\\X\SASdata\dataset.csv'
out=dataset1
dbms=csv

replace;
run;

 

Your help is much appreciated. Thank you!

1 ACCEPTED SOLUTION

Accepted Solutions
Emma_at_SAS
Lapis Lazuli | Level 10

Finally, I could not import the .csv file but

LIBNAME mylibname XLSX '\\C:\direction\filemname.xlsx';

worked very well with the correct rows and columns. The hint is that in the data step, I need to specify the Excel sheet that contains the data I want to work on (for me, it was sheet1)

data mydata;
set mylibname.sheet1;
run;   

Another point was that I kept the data the same that was read from the .xlsx file. When I converted the file to SAS data, it changed the type of the variables and I did not want that.

 

Thank you, everyone!

View solution in original post

10 REPLIES 10
ballardw
Super User

Likely should add: Guessingrows=max; to examine more of the data.

Sparsely occupied variables that aren't populated in the first few rows of data may not get read properly as the proc only  uses about 20 rows by default to read.

 

If that doesn't help there may be issues with content that would require more information to address.

 

Emma_at_SAS
Lapis Lazuli | Level 10
Thank you, ballardw! I added your suggestion to my code, but I think it is taking very long to run the Guessingrows=max for my large data set. may I add 50 instead of max?
ballardw
Super User

@Emma_at_SAS wrote:
Thank you, ballardw! I added your suggestion to my code, but I think it is taking very long to run the Guessingrows=max for my large data set. may I add 50 instead of max?

There is a performance penalty.

If you need performance then write a data step and do not rely on a guessing procedure.

At a minimum with 30,000 records I would use a minimum of 1000 rows if forced but I would actually use the guessingrow=max and wait.

 

Or write a data step. If you have a document describing the file you might be able to do that easily. If not the question might be why there are 2400 variables. If someone is doing a typical-trained-on-spreasheets approach of having a separate column for each time period/occasion/person instead of a variable holding time period/occasion/person identification then likely you will want to fix that later (or go back to the source and get a much nicer to work with file structure).

 

 

Kurt_Bremser
Super User

With 2400 columns (who comes up with such a bad table design?) it may well be that PROC IMPORT has troubles processing the header line.

Maybe you find a way to create the data step programmatically from the file documentation.

Emma_at_SAS
Lapis Lazuli | Level 10

Thank you, KurBremser 

I do not need all the columns for my analyses. So, you think if I remove the columns from the .csv file, it may solve my problem?

Also, if a large data really has so many variables, what is your suggestion to make a better design for data?

 

Thank you!

Tom
Super User Tom
Super User

@Emma_at_SAS wrote:

Thank you, KurBremser 

I do not need all the columns for my analyses. So, you think if I remove the columns from the .csv file, it may solve my problem?

Also, if a large data really has so many variables, what is your suggestion to make a better design for data?

 

Thank you!


No need to remove the columns from the CSV file (that would just introduce a risk of corrupting the data.)  

You can just not keep the variables you don't need.  You do need to read each variable up to the last variable on the line you want to keep.  But you could just read them all into a dummy variable that you drop before writing the dataset.

Much easier to just write your own DATA step than depend on PROC IMPORT guessing what to do with the file.

Here is an example to read two character fields and three numeric fields, one of which is a date, from a file with at least 20 fields per line.  So it skips the 2 and 3 fields (reads them into the DUMMY variable) and also the 13 fields before the date.

data want;
  infile 'myfile.csv' dsd truncover firstobs=2 ;
  length id $10 age 8 gender $6 weight date 8;
  length dummy $1 ;
  informat date date.;
  format date date9.;
  input id 2*dummy age gender weight 13*dummy  date;
  drop dummy;
run;
Emma_at_SAS
Lapis Lazuli | Level 10
Thank you for reviewing my other thoughts and for your example code!
Tom
Super User Tom
Super User

Proc Import has trouble when the header row is too long for it. If the average header value is 13 character long for 2,400 variables then the header row will be more than 32K bytes long.

 

Try running this simple step to see what the beginning of the text file looks like:

data _null_;
  infile '\\X\SASdata\dataset.csv' obs=3;
  input;
  list;
run;
Emma_at_SAS
Lapis Lazuli | Level 10
Thank you, Tom! I think that is the case for my data. The header values are long, on average even more than 13 characters.
Emma_at_SAS
Lapis Lazuli | Level 10

Finally, I could not import the .csv file but

LIBNAME mylibname XLSX '\\C:\direction\filemname.xlsx';

worked very well with the correct rows and columns. The hint is that in the data step, I need to specify the Excel sheet that contains the data I want to work on (for me, it was sheet1)

data mydata;
set mylibname.sheet1;
run;   

Another point was that I kept the data the same that was read from the .xlsx file. When I converted the file to SAS data, it changed the type of the variables and I did not want that.

 

Thank you, everyone!

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 10 replies
  • 1103 views
  • 4 likes
  • 4 in conversation