- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi, I am using the following code to write large data to CSV:
proc transpose data=mcdtemp.&mc_profile_table(obs=0) out=mcdtemp._names;
var _all_;
run;
data _null_;
file "&csv_export_file" dsd delimiter='|' lrecl=1000000 encoding="utf-8";
set mcdtemp._names end=eof;
put _name_ @ ;
if eof then put;
run;
%macro write_profile_csv;
%let mc_csv_i=100000;
%do %while (&mc_csv_i<(&document_count + 100000));
%let mc_f_obs = %eval(&mc_csv_i. - 99999);
data _null_;
file "&csv_export_file" dsd delimiter='|' lrecl=1000000 MOD encoding="utf-8";
set mcdtemp.&mc_profile_table(firstobs=&mc_f_obs obs=&mc_csv_i);
put (_all_) (+0);
run;
%let mc_csv_i = %eval(&mc_csv_i. + 100000);
%end;
%mend write_profile_csv;
data _null_;
call execute('%write_profile_csv;');
run;
Now it appears that the data includes sometimes line breaks that will break the CSVs.
Any suggestions how I could modify the above code in order to compress line breaks from all columns or the only option is to write some code prior that will "clean" the columns?
Cheers,
Olli
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
That code looks overly complicated, so solution below ignores the stuff that seems intended just to obfuscate the code.
If you want to remove CR and/or LF characters from the data you could just loop over the character fields. Here is code to allow you to loop over all the character variables. You can use the older style DO OVER loop with an implicitly referenced array named _CHARACTER_ that uses _N_ as its index variable. That way you do not have to introduce any variable names (for the ARRAY name or the array INDEX) that might conflict with existing variable names.
data _null_;
file "&csv_export_file" dsd delimiter='|' lrecl=1000000 MOD encoding="utf-8";
set mcdtemp.&mc_profile_table;
array _character_ (_n_) _character_;
do over _character_; _character_ = translate(_character_,' ','0D0A'x); end;
put (_all_) (+0);
run;
The only drawback is if you wanted to use the automatic macro variable _n_ for something. Its value is overwritten by being used for indexing into the implicit array.
Note if the CR and/or LF characters are coming from the formatted values of the variables then you need something more powerful (but slower) such as, https://github.com/sasutils/macros/blob/master/safe_ds2csv.sas
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
hi @ojaro
something like:
data _null_;
file "&csv_export_file" dsd delimiter='|' lrecl=1000000 MOD encoding="utf-8";
set mcdtemp.&mc_profile_table(firstobs=&mc_f_obs obs=&mc_csv_i);
array char _character_;
do i = 1 to dim(allchar);
char{i} = compress(char{i},'0d0a'x);
end;
put (_all_) (+0);
run;
This will remove all occurences of CR and LF in all character variables. Use a translate function instead of compress if you want to insert a blank.
If you know in which variable the linebreaks occur, then drop the loop and change that variable only.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
That code looks overly complicated, so solution below ignores the stuff that seems intended just to obfuscate the code.
If you want to remove CR and/or LF characters from the data you could just loop over the character fields. Here is code to allow you to loop over all the character variables. You can use the older style DO OVER loop with an implicitly referenced array named _CHARACTER_ that uses _N_ as its index variable. That way you do not have to introduce any variable names (for the ARRAY name or the array INDEX) that might conflict with existing variable names.
data _null_;
file "&csv_export_file" dsd delimiter='|' lrecl=1000000 MOD encoding="utf-8";
set mcdtemp.&mc_profile_table;
array _character_ (_n_) _character_;
do over _character_; _character_ = translate(_character_,' ','0D0A'x); end;
put (_all_) (+0);
run;
The only drawback is if you wanted to use the automatic macro variable _n_ for something. Its value is overwritten by being used for indexing into the implicit array.
Note if the CR and/or LF characters are coming from the formatted values of the variables then you need something more powerful (but slower) such as, https://github.com/sasutils/macros/blob/master/safe_ds2csv.sas
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content