01-13-2014 01:41 PM
I'm creating a CSV output file using ODS CSV and it takes a surprisingly long time to create. It's large, but while it takes less than 17 minutes to read the data from disk and create a subset, it takes 94 minutes to write it out. Before I start investigating whether there are any hardware-related issues that could account for this, is there any SAS-related issues that I should be aware of? Is there an alternative approach that would accomplish the same thing? I chose Proc Print because I need the variables a specific order and I can easily do that. Any other solution would need to be able to the same thing.
My log is below. Thanks for your help.
109 rsubmit ;
NOTE: Remote submit to TASK1 commencing.
63 libname spj1 "\\cdc\csp_private\m103\spj1" ;
NOTE: Libref SPJ1 was successfully assigned as follows:
Physical Name: \\cdc\csp_private\m103\spj1
64 options obs=max ;
65 Data spj1.VaxData ;
66 set &Grantee..Vaccination
67 (Keep= CDCID VaxGroup Vaxdate MVX CVX CPT TradeName LotNum Route Site RecordType
68 VaxComplete VaxRefusal IISValidity ProvID V_Eligibility VaxEntDate ProvChar) ;
69 where VaxGroup = 6 ;
70 retain VaxRecordNum 0 ;
71 VaxRecordNum + 100 ;
72 MatchNum = 1 ;
73 Length _PatientID $15 _VaxDate $8 _VaxEntDate $8 LotNum1-LotNum3 $1 ;
74 _PatientID = scan(CDCID,3,'-') ;
75 _VaxDate = put(VaxDate,yymmddN8.) ;
76 _VaxEntDate = put(VaxEntDate,yymmddn8.) ;
77 call missing(lotNum1, LotNum2, LotNum3) ;
78 run ;
NOTE: There were 4850003 observations read from the data set BAA.VACCINATION.
NOTE: The data set SPJ1.VAXDATA has 4850003 observations and 26 variables.
NOTE: DATA statement used (Total process time):
real time 16:56.65
cpu time 3:02.39
81 ods _all_ close ;
82 ods tagsets.csv file="\\cdc\csp_private\m103\spj1\VaxData.CSV" ;
NOTE: Writing TAGSETS.CSV Body file: \\cdc\csp_private\m103\spj1\VaxData.CSV
84 Proc Print data = spj1.VaxData noobs ;
85 Var _PatientID VaxRecordNum MatchNum _Vaxdate MVX CVX CPT TradeName LotNum1-LotNum3 Route
85 ! Site RecordType VaxComplete
86 VaxRefusal IISValidity ProvID V_Eligibility _VaxEntDate ProvChar ;
87 run ;
NOTE: There were 4850003 observations read from the data set SPJ1.VAXDATA.
NOTE: PROCEDURE PRINT used (Total process time):
real time 1:34:00.66
cpu time 1:32:46.20
89 ods _all_ close ;
NOTE: Remote submit to TASK1 complete.
01-13-2014 02:21 PM
Reeza's suggestion is a good one and probably the fastest. However if you want to stick with ODS then there is a tagset called SIMPLECSV that may be a bit faster than the standard CSV tagset. The difference between the standard CSV tagset and SIMPLECSV is that SIMPLECSV is treats currency and date values as strings instead of numbers and may quote other kinds of strings that are actually numbers. YMMV.
ods tagsets.simplecsv file="whatever";
01-15-2014 04:46 AM
As Reeza and Tim said it's better to use data _null_ step to create csv files . I am also using the same to create. Here is my code:
set result_&CIRC ;
length head1 $2000.;
head1 = "&variable_names";
if _n_=1 then do;
/*head2 head3 head4 head5*/