BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
BruceBrad
Lapis Lazuli | Level 10

I have a CSV file where the first record is very long because of long variable names (about 60k). I want to read the file in, edit the first line (eg replace all occurences of "longword" with "lw"), save the file, then read it in as per a normal CSV file. Because the record length is over 32k, I can't use the _infile_ method. Any suggestions for the best way to do this in SAS? (I'm using 9.4 on windows). 

 

I guess I could use some Windows batch/power script, but would prefer the portability of keeping it in SAS.

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

Since when you read a CSV file into data you SKIP the header line I do not see why reading that should cause any trouble.  The data lines are less than 12K bytes long.

NOTE: 2 records were read from the infile (system-specific pathname).
      The minimum record length was 10232.
      The maximum record length was 11811.

Are you trying to use PROC IMPORT to make guesses about how to read the file?  Why ?  It looks like all of the variables are NUMERIC, except for the first one.

data want;
  infile 'myfile.csv' dsd firstob=2 truncover;
  input lisacronym :$10. var2-var1216 ;
run;

You can then read in the header row and use that to generate labels for your variables.  

filename code temp;
data names ;
  infile "myfile.csv" dsd lrecl=1000000 obs=1 ;
  varnum+1;
  input label :$256. @@;
  if varnum=1 then put 'label ' ;
  else put 'var' varnum '=' label :$quote. ;
run;
proc datasets nolist lib=work;
modify want;
%include code ;
;
run;
quit;

If you do want to use something to GUESS how to read the file then use this macro instead.  It will both read the long header row properly and also create unique variable names for the variables whose labels are not unique in the first 32 bytes.

https://github.com/sasutils/macros/blob/master/csv2ds.sas

 

Example:

filename csv "C:\downloads\LongRecordTest.csv" lrecl=1000000;
%csv2ds(csv,out=want,replace=1);

 

Note: The %CSV2DS() macro will make empty variables character of length 1 just like PROC IMPORT.  But you can use the OVERRRIDES= option to give it a dataset to force it make those columns numeric if you want.  So you could add this code to make the over ride information from the _TYPES_ dataset it generated and re-run the macro passing in that those overrides.

data overrides;
  set _types_;
  where xtype='empty';
  type='num';length='8';
  keep varnum type length;
run;
%csv2ds(csv,out=want,overrides=overrides,replace=1);

View solution in original post

6 REPLIES 6
Ksharp
Super User
It would be better if you post this CSV file.
You could try option LRECL=1000000000000 of INFILE to break the limit of 32K.
BruceBrad
Lapis Lazuli | Level 10

I've attached the first few records of the file (Excel can open it). The lrecl option doesn't work because I'm editing the _infile_ variable which has a fixed upper length of 32K.

Ksharp
Super User

You could import this csv file into sas firstly, afterwards change variable name by data step ?

 

filename x 'c:\temp\LongRecordTest.csv' lrecl=1000000 encoding='utf8' termstr=crlf;

proc import datafile=x out=have dbms=csv replace;
run;



proc transpose data=have(obs=0) out=temp;
var _all_;
run;
data temp;
 set temp;
if find(_NAME_,'Lonemother') then do; new_name=tranwrd(_NAME_,'Lonemother','LO');output;end;
run;
data _null_;
 set temp end=last;
if _n_=1 then call execute('proc datasets library=work nolist nodetails;modify have;rename ');
call execute(catx('=',_name_,new_name));
if last then call execute(';quit;');
run;

 

 

 

 

BruceBrad
Lapis Lazuli | Level 10

Thanks. I guess I'll have to try this approach (even if a bit messy). Thanks for investigating.

Ksharp
Super User

OK. It looks like your problem is not so easy. Try this one :

options compress=yes;
data have;
infile 'c:\temp\LongRecordTest.csv' dsd truncover lrecl=1000000 encoding='utf8' termstr=crlf;
input (__v1-__v2000) (:$200.);
run;
proc transpose data=have(obs=1) out=rename;
var _all_;
run;
data rename2;
 set rename(where=(col1 is not missing));
 new_name=tranwrd(col1,'Lonemother','LO');
 new_name=tranwrd(new_name,'Twoparent','TP');
 if length(new_name)>32 then new_name=substr(new_name,1,32);
run;
data _null_;
 set rename2 end=last;
if _n_=1 then call execute('proc datasets library=work nolist nodetails;modify have;rename ');
call execute(catx('=',_name_,new_name));
if last then call execute(';quit;');
run;
data want;
 set have(firstobs=2);
 drop __v:;
run;
Tom
Super User Tom
Super User

Since when you read a CSV file into data you SKIP the header line I do not see why reading that should cause any trouble.  The data lines are less than 12K bytes long.

NOTE: 2 records were read from the infile (system-specific pathname).
      The minimum record length was 10232.
      The maximum record length was 11811.

Are you trying to use PROC IMPORT to make guesses about how to read the file?  Why ?  It looks like all of the variables are NUMERIC, except for the first one.

data want;
  infile 'myfile.csv' dsd firstob=2 truncover;
  input lisacronym :$10. var2-var1216 ;
run;

You can then read in the header row and use that to generate labels for your variables.  

filename code temp;
data names ;
  infile "myfile.csv" dsd lrecl=1000000 obs=1 ;
  varnum+1;
  input label :$256. @@;
  if varnum=1 then put 'label ' ;
  else put 'var' varnum '=' label :$quote. ;
run;
proc datasets nolist lib=work;
modify want;
%include code ;
;
run;
quit;

If you do want to use something to GUESS how to read the file then use this macro instead.  It will both read the long header row properly and also create unique variable names for the variables whose labels are not unique in the first 32 bytes.

https://github.com/sasutils/macros/blob/master/csv2ds.sas

 

Example:

filename csv "C:\downloads\LongRecordTest.csv" lrecl=1000000;
%csv2ds(csv,out=want,replace=1);

 

Note: The %CSV2DS() macro will make empty variables character of length 1 just like PROC IMPORT.  But you can use the OVERRRIDES= option to give it a dataset to force it make those columns numeric if you want.  So you could add this code to make the over ride information from the _TYPES_ dataset it generated and re-run the macro passing in that those overrides.

data overrides;
  set _types_;
  where xtype='empty';
  type='num';length='8';
  keep varnum type length;
run;
%csv2ds(csv,out=want,overrides=overrides,replace=1);

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 6 replies
  • 888 views
  • 0 likes
  • 3 in conversation