Hello,
I need to export an approximately ~55 GB dataset with 160 million rows as .csv. Due to issues with quoting I ended up using %ds2csv macro. On our set-up this is taking between 4 and 5 hours, which is a lot. Looking at task manager, the dataset is getting downloaded in chunks at 65 Mbps, and the SAS CPU load hovers at 15% which would mean fully loading one CPU core.
Is there a simple way of speeding the export up? Reducing the amount of data needing upload would require significant rewrite of how the ETL pipeline works. Thanks!
It turns out %ds2csv was the actual culprit. I switched back to proc export because %ds2csv was not producing any timer information. To my surprise, writing to gzip was miles faster, finishing in under 25 minutes:
NOTE: 142213133 records were written to the file CSVXPORT.
The minimum record length was 155.
The maximum record length was 486.
NOTE: There were 142213132 observations read from the data set SAS_LIB.xxx.
NOTE: There were 142213132 observations read from the data set SPARKDTM.xxx.
NOTE: DATA statement used (Total process time):
real time 22:32.09
cpu time 17:38.70
Since I do not need quoting superiority of %ds2csv for all the datasets, I can probably just use it when absolutely necessary and stick to proc export otherwise.
Please post proc contents of the dataset.
The CONTENTS Procedure
Data Set Name xxxxxxxx.yyyyy_yyyyyyy_v3 Observations 142213132
Member Type DATA Variables 39
Engine V9 Indexes 0
Created 11/10/2025 12:19:06 Observation Length 802
Last Modified 11/10/2025 12:19:06 Deleted Observations 0
Protection Compressed CHAR
Data Set Type Reuse Space NO
Label Point to Observations YES
Data Representation WINDOWS_64 Sorted NO
Encoding utf-8 Unicode (UTF-8)
Engine/Host Dependent Information
Data Set Page Size 65536
Number of Data Set Pages 801994
Number of Data Set Repairs 0
ExtendObsCounter YES
Filename xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx\yyyyy_yyyyyyy_v3.sas7bdat
Release Created 9.0401M7
Host Created X64_DSRV19
Owner Name aaaaaaaaaaaaaaa
File Size 49GB
File Size (bytes) 52559544320
Alphabetic List of Variables and Attributes
# Variable Type Len Format Label
39 a Char 2
28 b Char 5
29 c Char 5
30 d Char 5
31 e Char 5
32 f Char 5
25 g Char 5
33 h Char 5
26 i Char 5
21 j Char 5
24 k Char 5
22 l Char 5
23 m Char 5
27 n Char 5
11 COMMENT Char 100 COMMENT
12 CONTROLs Char 2 CONTROLs
36 COUNT Num 8 Frequency Count
14 Cells per Well Num 8
3 EXPERIMENT NAME Char 50 EXPERIMENT NAME
10 EXPERT COMMENT Char 100 EXPERT COMMENT
2 FILE NAME Char 50 FILE NAME
8 FlowJo VERSION Char 10 FlowJo VERSION
38 Frequency Num 8
34 o Char 5
9 PANEL ID Char 20 PANEL ID
4 PATIENT ID Char 20 PATIENT ID
7 PROTOCOL ID Char 50 PROTOCOL ID
16 RFSTDT Num 4 E8601DA.
15 SAMPLE DATE Num 4 E8601DA.
6 SAMPLE ID Char 20 SAMPLE ID
5 SAMPLE TYPE Char 20 SAMPLE TYPE
17 Day Num 3
35 p Char 5
13 Visit Char 20
18 basename Char 100
20 q Char 10
19 r Char 10
37 count_sum_carstate Num 8
1 source Char 100
You mention upload and download. Can you describe your environment a bit more?
Is this SAS running on a server, or local PC SAS? If you've got a SAS dataset sitting on a Windows server, and then you're exporting and downloading a CSV, it could be the the network speed is causing the long run time. As a test, you could change your code to write the CSV file to the Work library. That should show how long it takes to write the CSV file, without the overhead of network transfer. Then perhaps if you determine the issue is network latency, you could work with your admin on that, and perhaps consider compressing the CSV file before it is transferred.
Typically I've found PROC EXPORT to generate a CSV to be fast, so I would give that a try to see if it can handle your data.
SAS is running on a server. It fetches the data from a remote server and exports the .csv into SAS' workdir (I am using filename temp). I do not think that network bandwidth is the issue, as the download speed goes up-and-down, indicative of network transfer waiting for something.
The reason I am using %ds2csv instead of proc export is the fact that the latter does not apply sufficient quoting for values with line breaks in them.
@js5 wrote:
SAS is running on a server. It fetches the data from a remote server and exports the .csv into SAS' workdir (I am using filename temp). I do not think that network bandwidth is the issue, as the download speed goes up-and-down, indicative of network transfer waiting for something.
The reason I am using %ds2csv instead of proc export is the fact that the latter does not apply sufficient quoting for values with line breaks in them.
A more likely reason for the download speed to go up and down is competition for the network bandwidth from other processes.
To force SAS to add quotes around values it does not think needs them you can just use the ~ modifier in the PUT statement. If the remote system does not mind if the numeric variables have quotes around them (the SAS INPUT statement when using the DSD option on the INFILE statement does not mind them for example) then you could just add the quotes around everything.
put (_all_) (~);
Which means your export step could be as simple as:
proc transpose data=&indsn(obs=0) out=names;
var _all_;
run;
data _null_;
file &outfile dsd lrecl=1000000;
length _name_ $32 _label_ $256 ;
set names;
_label_=coalescec(_label_,_name_);
put _label_ ~ @ ;
run;
data _null_;
file &outfile mod dsd lrecl=1000000;
set &indsn;
put (_all_) (~);
run;
Results
"Name","Sex","Age","Height","Weight" "Alfred","M","14","69","112.5" "Alice","F","13","56.5","84" "Barbara","F","13","65.3","98" "Carol","F","14","62.8","102.5" "Henry","M","14","63.5","102.5" "James","M","12","57.3","83"
If you NEED to only add the ~ for the character variables then use PROC CONTENTS to get the list of file names so you have the TYPE of the variable also to allow you to determine when to add the ~ modifier.
filename csv temp;
%let outfile=csv;
%let indsn=sashelp.class;
proc contents data=&indsn out=contents noprint;
run;
proc sql noprint;
select quote(trim(coalesce(label,name)))
, catx(' ',nliteral(name)
,case when type=2 then '~' else ' ' end
)
into :headers separated by ','
, :varlist separated by ' '
from contents
order by varnum
;
quit;
data _null_;
file &outfile dsd lrecl=1000000;
put %sysfunc(quote(%superq(headers),%str(%')));
run;
data _null_;
file &outfile mod dsd lrecl=1000000;
set &indsn;
put &varlist;
run;
Results
"Name","Sex","Age","Height","Weight" "Alfred","M",14,69,112.5 "Alice","F",13,56.5,84 "Barbara","F",13,65.3,98 "Carol","F",14,62.8,102.5 "Henry","M",14,63.5,102.5 "James","M",12,57.3,83
What feature of %DS2CSV is it that you need? I doubt that it is slowing things down since I think it also just generates a data step, like PROC EXPORT does. You should be able to see what code it actually generated by turning on the MPRINT option.
The best way to improve performance would be to skip the CSV file and have SAS send the data directly to the ultimate location, if you can.
Other things to do to improve the performance:
A simple process like writing a dataset to an external text file is ALWAYS I/O bound. Given your further description, the main bottleneck is the download from the remote server (database or base SAS library on a network share, or anything else? Please clarify), followed by the bandwidth of the local storage where WORK resides.
If you fetch from a database, the observed fluctuations may come from network saturation and/or load on the database system itself.
It turns out %ds2csv was the actual culprit. I switched back to proc export because %ds2csv was not producing any timer information. To my surprise, writing to gzip was miles faster, finishing in under 25 minutes:
NOTE: 142213133 records were written to the file CSVXPORT.
The minimum record length was 155.
The maximum record length was 486.
NOTE: There were 142213132 observations read from the data set SAS_LIB.xxx.
NOTE: There were 142213132 observations read from the data set SPARKDTM.xxx.
NOTE: DATA statement used (Total process time):
real time 22:32.09
cpu time 17:38.70
Since I do not need quoting superiority of %ds2csv for all the datasets, I can probably just use it when absolutely necessary and stick to proc export otherwise.
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.