BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
ojaro
SAS Employee

Hi, I am using the following code to write large data to CSV:

proc transpose data=mcdtemp.&mc_profile_table(obs=0) out=mcdtemp._names;
  var _all_;
run;

data _null_;
  file "&csv_export_file" dsd delimiter='|' lrecl=1000000 encoding="utf-8";
  set mcdtemp._names end=eof;
  put _name_ @ ;
  if eof then put;
run;

%macro write_profile_csv;
%let mc_csv_i=100000;
%do %while (&mc_csv_i<(&document_count + 100000));
	%let mc_f_obs = %eval(&mc_csv_i. - 99999);
	data _null_;
	  file "&csv_export_file" dsd delimiter='|' lrecl=1000000 MOD encoding="utf-8";
	  set mcdtemp.&mc_profile_table(firstobs=&mc_f_obs obs=&mc_csv_i);
	  put (_all_) (+0);
	run;

	%let mc_csv_i = %eval(&mc_csv_i. + 100000);
%end;
%mend write_profile_csv;
data _null_;
	call execute('%write_profile_csv;');
run;

Now it appears that the data includes sometimes line breaks that will break the CSVs.

Any suggestions how I could modify the above code in order to compress line breaks from all columns or the only option is to write some code prior that will "clean" the columns?

Cheers,

 

Olli

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

That code looks overly complicated, so solution below ignores the stuff that seems intended just to obfuscate the code.

 

If you want to remove CR and/or LF characters from the data you could just loop over the character fields.  Here is code to allow you to loop over all the character variables.  You can use the older style DO OVER loop with an implicitly referenced array named _CHARACTER_ that uses _N_ as its index variable.  That way you do not have to introduce any variable names (for the ARRAY name or the array INDEX) that might conflict with existing variable names.

data _null_;
  file "&csv_export_file" dsd delimiter='|' lrecl=1000000 MOD encoding="utf-8";
  set mcdtemp.&mc_profile_table;
  array _character_ (_n_) _character_;
  do over _character_; _character_ = translate(_character_,'  ','0D0A'x); end;
  put (_all_) (+0);
run;

The only drawback is if you wanted to use the automatic macro variable _n_ for something.  Its value is overwritten by being used for indexing into the implicit array.

 

Note if the CR and/or LF characters are coming from the formatted values of the variables then you need something more powerful (but slower) such as,   https://github.com/sasutils/macros/blob/master/safe_ds2csv.sas

View solution in original post

3 REPLIES 3
ErikLund_Jensen
Rhodochrosite | Level 12

hi @ojaro 

something like:

data _null_;
	file "&csv_export_file" dsd delimiter='|' lrecl=1000000 MOD encoding="utf-8";
	set mcdtemp.&mc_profile_table(firstobs=&mc_f_obs obs=&mc_csv_i);
	array char _character_;
	do i = 1 to dim(allchar);
		char{i} = compress(char{i},'0d0a'x);
	end;
	put (_all_) (+0);
run;

This will remove all occurences of CR and LF in all character variables. Use a translate function instead of compress if you want to insert a blank.

If you know  in which variable  the linebreaks occur, then drop the loop and change that variable only.

Tom
Super User Tom
Super User

That code looks overly complicated, so solution below ignores the stuff that seems intended just to obfuscate the code.

 

If you want to remove CR and/or LF characters from the data you could just loop over the character fields.  Here is code to allow you to loop over all the character variables.  You can use the older style DO OVER loop with an implicitly referenced array named _CHARACTER_ that uses _N_ as its index variable.  That way you do not have to introduce any variable names (for the ARRAY name or the array INDEX) that might conflict with existing variable names.

data _null_;
  file "&csv_export_file" dsd delimiter='|' lrecl=1000000 MOD encoding="utf-8";
  set mcdtemp.&mc_profile_table;
  array _character_ (_n_) _character_;
  do over _character_; _character_ = translate(_character_,'  ','0D0A'x); end;
  put (_all_) (+0);
run;

The only drawback is if you wanted to use the automatic macro variable _n_ for something.  Its value is overwritten by being used for indexing into the implicit array.

 

Note if the CR and/or LF characters are coming from the formatted values of the variables then you need something more powerful (but slower) such as,   https://github.com/sasutils/macros/blob/master/safe_ds2csv.sas

ojaro
SAS Employee
Thank you so much! This works perfectly.

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 3 replies
  • 1790 views
  • 0 likes
  • 3 in conversation