I have a CSV file where the first record is very long because of long variable names (about 60k). I want to read the file in, edit the first line (eg replace all occurences of "longword" with "lw"), save the file, then read it in as per a normal CSV file. Because the record length is over 32k, I can't use the _infile_ method. Any suggestions for the best way to do this in SAS? (I'm using 9.4 on windows).
I guess I could use some Windows batch/power script, but would prefer the portability of keeping it in SAS.
Since when you read a CSV file into data you SKIP the header line I do not see why reading that should cause any trouble. The data lines are less than 12K bytes long.
NOTE: 2 records were read from the infile (system-specific pathname). The minimum record length was 10232. The maximum record length was 11811.
Are you trying to use PROC IMPORT to make guesses about how to read the file? Why ? It looks like all of the variables are NUMERIC, except for the first one.
data want;
infile 'myfile.csv' dsd firstob=2 truncover;
input lisacronym :$10. var2-var1216 ;
run;
You can then read in the header row and use that to generate labels for your variables.
filename code temp;
data names ;
infile "myfile.csv" dsd lrecl=1000000 obs=1 ;
varnum+1;
input label :$256. @@;
if varnum=1 then put 'label ' ;
else put 'var' varnum '=' label :$quote. ;
run;
proc datasets nolist lib=work;
modify want;
%include code ;
;
run;
quit;
If you do want to use something to GUESS how to read the file then use this macro instead. It will both read the long header row properly and also create unique variable names for the variables whose labels are not unique in the first 32 bytes.
https://github.com/sasutils/macros/blob/master/csv2ds.sas
Example:
filename csv "C:\downloads\LongRecordTest.csv" lrecl=1000000;
%csv2ds(csv,out=want,replace=1);
Note: The %CSV2DS() macro will make empty variables character of length 1 just like PROC IMPORT. But you can use the OVERRRIDES= option to give it a dataset to force it make those columns numeric if you want. So you could add this code to make the over ride information from the _TYPES_ dataset it generated and re-run the macro passing in that those overrides.
data overrides;
set _types_;
where xtype='empty';
type='num';length='8';
keep varnum type length;
run;
%csv2ds(csv,out=want,overrides=overrides,replace=1);
I've attached the first few records of the file (Excel can open it). The lrecl option doesn't work because I'm editing the _infile_ variable which has a fixed upper length of 32K.
You could import this csv file into sas firstly, afterwards change variable name by data step ?
filename x 'c:\temp\LongRecordTest.csv' lrecl=1000000 encoding='utf8' termstr=crlf;
proc import datafile=x out=have dbms=csv replace;
run;
proc transpose data=have(obs=0) out=temp;
var _all_;
run;
data temp;
set temp;
if find(_NAME_,'Lonemother') then do; new_name=tranwrd(_NAME_,'Lonemother','LO');output;end;
run;
data _null_;
set temp end=last;
if _n_=1 then call execute('proc datasets library=work nolist nodetails;modify have;rename ');
call execute(catx('=',_name_,new_name));
if last then call execute(';quit;');
run;
Thanks. I guess I'll have to try this approach (even if a bit messy). Thanks for investigating.
OK. It looks like your problem is not so easy. Try this one :
options compress=yes;
data have;
infile 'c:\temp\LongRecordTest.csv' dsd truncover lrecl=1000000 encoding='utf8' termstr=crlf;
input (__v1-__v2000) (:$200.);
run;
proc transpose data=have(obs=1) out=rename;
var _all_;
run;
data rename2;
set rename(where=(col1 is not missing));
new_name=tranwrd(col1,'Lonemother','LO');
new_name=tranwrd(new_name,'Twoparent','TP');
if length(new_name)>32 then new_name=substr(new_name,1,32);
run;
data _null_;
set rename2 end=last;
if _n_=1 then call execute('proc datasets library=work nolist nodetails;modify have;rename ');
call execute(catx('=',_name_,new_name));
if last then call execute(';quit;');
run;
data want;
set have(firstobs=2);
drop __v:;
run;
Since when you read a CSV file into data you SKIP the header line I do not see why reading that should cause any trouble. The data lines are less than 12K bytes long.
NOTE: 2 records were read from the infile (system-specific pathname). The minimum record length was 10232. The maximum record length was 11811.
Are you trying to use PROC IMPORT to make guesses about how to read the file? Why ? It looks like all of the variables are NUMERIC, except for the first one.
data want;
infile 'myfile.csv' dsd firstob=2 truncover;
input lisacronym :$10. var2-var1216 ;
run;
You can then read in the header row and use that to generate labels for your variables.
filename code temp;
data names ;
infile "myfile.csv" dsd lrecl=1000000 obs=1 ;
varnum+1;
input label :$256. @@;
if varnum=1 then put 'label ' ;
else put 'var' varnum '=' label :$quote. ;
run;
proc datasets nolist lib=work;
modify want;
%include code ;
;
run;
quit;
If you do want to use something to GUESS how to read the file then use this macro instead. It will both read the long header row properly and also create unique variable names for the variables whose labels are not unique in the first 32 bytes.
https://github.com/sasutils/macros/blob/master/csv2ds.sas
Example:
filename csv "C:\downloads\LongRecordTest.csv" lrecl=1000000;
%csv2ds(csv,out=want,replace=1);
Note: The %CSV2DS() macro will make empty variables character of length 1 just like PROC IMPORT. But you can use the OVERRRIDES= option to give it a dataset to force it make those columns numeric if you want. So you could add this code to make the over ride information from the _TYPES_ dataset it generated and re-run the macro passing in that those overrides.
data overrides;
set _types_;
where xtype='empty';
type='num';length='8';
keep varnum type length;
run;
%csv2ds(csv,out=want,overrides=overrides,replace=1);
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 16. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.