BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
neeld1
Fluorite | Level 6

Hello All,

 

I am a noob sas user. I'm more proficient in R, but SAS is totally new to me. I'm facing multiple issues reading a large file too.

 

proc import
	datafile = ".\my_data.csv"
	out = work.imported_data
	dbms = csv
	replace;
	guessingrows = 1000;
run;

The csv file that I'm trying to import does not read in the correct variable names. My csv has 3055 variables.

 

Currently, proc export is reading the variables names up to 2933 variables correctly and then renaming the remaining variables as VAR2934, VAR2935, VAR2936... VAR3055. 

 

I have tried getnames = no and obs = 1 option as well. The first row does not read in correctly. I've found the issue on google. It seems that the LRECL of the first row is greater than 32767 bytes/characters. 

 

Do you know how I can change the LRECL to read in the data correctly? 

 

I have attached the data and code for your reference. 

 

Currently I have to manually rename the variables. 

 

Please suggest. Your help is much appreciated.

 

Regards,
Neel

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

Just tell PROC IMPORT to ignore the header row.

filename csv "&path/my_data.csv" lrecl=2000000;
proc import datafile=csv out=test1 replace dbms=csv;
 getnames=no;
 datarow=2;
run;

You can use it to rename the variables yourself.  So get the names that PROC IMPORT generated.

proc transpose data=test1(obs=0) out=names ;
 var _all_;
run;

Then read in the name from the first row and use them to generate RENAME statement.

data new_names;
  infile csv obs=1 dsd ;
  input new_name :$32. @@;
run;

filename code temp;
data _null_;
  file code;
  set names end=eof;
  if not eof2 then set new_names end=eof2;
  if _n_=1 then put 'rename';
  put _name_ '=' new_name  ;
  if eof then put ';';
run;

Then use PROC DATASETS to run the rename.

proc datasets nolist lib=work;
  modify test1;
  %include code;
run;quit;

 

View solution in original post

10 REPLIES 10
Tom
Super User Tom
Super User

I think this has been answered before.  The solution is to create a fileref so you can tell SAS what record length to use.

filename csv  ".\my_data.csv" lrecl=2000000 ;

proc import
	datafile = csv
	dbms = csv
	out = work.imported_data
	replace
;
	guessingrows = 1000;
run;
neeld1
Fluorite | Level 6

Hello @Tom ,

 

Thank you for the response. Unfortunately, I've tried this solution before and it does not work.

 

You could download the data and syntax and try it as well. It does not seem to work.

 

Apologies, but I am a very new user to SAS.

 

image.pngimage.png

Thanks again,

Neel

ballardw
Super User

If the text of two or more column headings is exactly the same for the first 32 characters then SAS will name the first column with the first 32 characters and remaining variable(s) with VAR and the column number because it doesn't have any fancy instructions on trying to parse the column headers multiple times to try to resolve the like issue.

 

An example: replace the path placeholder with a path that works on your system:

data _null_;
   file "<path>\longcolumnname.csv" ;
   put 'columnheadinglongerthan32charactersnumber1,columnheadinglongerthan32charactersnumber2';
   put '123,456';
run;

proc import datafile= "<path>\longcolumnname.csv"
     out=work.import
     dbms=csv
     replace;
run;
ChrisHemedinger
Community Manager

If you have SAS Enterprise Guide, you can use the Import Data task to bootstrap and generate DATA step and INFILE, which gives you more control over how a file is read.  I've attached the generated code for your sample file.

Check out SAS Innovate on-demand content! Watch the main stage sessions, keynotes, and over 20 technical breakout sessions!
neeld1
Fluorite | Level 6

Hello @ChrisHemedinger ,

 

Appreciate the solution. This solution does work!

 

I would have liked a more succinct solution that would not require me to open sas enterprise, how this does do the trick.

 

Thanks again,

Neel

neeld1
Fluorite | Level 6

Hello @ballardw ;

 

Appreciate the response. However, while uploading the sample data, I've ensured that each variable name is 32 characters or less and that there are no duplicate records. So the truncation will not occur in the datastep or proc import.

 

Even I thought this would have been the issue, but unfortunately it has more to do with the LRECL than with the variable length in my case.

 

Appreciate the code snippet and response.

 

Regards,

Neel

Tom
Super User Tom
Super User

Just tell PROC IMPORT to ignore the header row.

filename csv "&path/my_data.csv" lrecl=2000000;
proc import datafile=csv out=test1 replace dbms=csv;
 getnames=no;
 datarow=2;
run;

You can use it to rename the variables yourself.  So get the names that PROC IMPORT generated.

proc transpose data=test1(obs=0) out=names ;
 var _all_;
run;

Then read in the name from the first row and use them to generate RENAME statement.

data new_names;
  infile csv obs=1 dsd ;
  input new_name :$32. @@;
run;

filename code temp;
data _null_;
  file code;
  set names end=eof;
  if not eof2 then set new_names end=eof2;
  if _n_=1 then put 'rename';
  put _name_ '=' new_name  ;
  if eof then put ';';
run;

Then use PROC DATASETS to run the rename.

proc datasets nolist lib=work;
  modify test1;
  %include code;
run;quit;

 

neeld1
Fluorite | Level 6

@Tom - Bravo! 

 

This is what I need. Thank you so much!

 

Really appreciate the help.

 

Regards,

Neel

neeld1
Fluorite | Level 6

@SASKiwi - Thank you so much for your response.

 

I am a noob sas user. I'm more proficient in R, but SAS is totally new to me. I'm facing multiple issues reading a large file too.

 

I can create a new thread or post down below:

 

The csv file that I'm trying to import does not read in the correct variable names. My csv has 3055 variables.

 

Currently, proc export is reading the variables names up to 2933 variables correctly and then renaming the remaining variables as VAR2934, VAR2935, VAR2936... VAR3055. 

 

I have tried getnames = no and obs = 1 option as well. The first row does not read in correctly. I've found the issue on google. It seems that the LRECL of the first row is greater than 32767 bytes/characters. 

 

Do you know how I can change the LRECL to read in the data correctly? 

 

proc import
	datafile = ".\&wave6dsn..csv"
	out = all&w6suffix.
	dbms = csv
	replace;
	guessingrows = 1000;
run;

Thanks,
Neel

SASKiwi
PROC Star

@neeld1 - Please post the SAS log of your program. 

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 10 replies
  • 2330 views
  • 5 likes
  • 5 in conversation