Hi Folks:
I'm trying to understand proc append in the given context below. I have a simulation study going on and got the problem solved in terms of executing it in SAS (see link below). The SAS program for simulation included 'proc append' as shown below.
proc append base=results_combined data=PE1 force nowarn;
run;
proc append base=results_summary data=pe_summary force nowarn;
run;
My question is, what are being achieved by these two proc appends? I read SAS documentations on proc append and a SUGI paper as well.
To understand better, I ran proc contents as following on the datasets specified after base= and data=. However, the dataset pairs are identical. Is proc append being used here just to change the name of the dataset from PE1 to results_combined and pe_summary to results_summary?
Thanks for your help! I appreciate your time.
proc contents data=results_combined; run;
proc contents data=PE1; run;
proc contents data=results_summary; run;
proc contents data=pe_summary; run;
Previous post where simulation problem was solved by Reeza.
Whole program for a simulation taking from the link of the post above.
%symdel;
%macro surveyLoop(sample=);
*calculate sample rate; %let sampRate = %sysevalf(&sample./100);
*create sample;
ods select none;
proc surveyselect data=method
samprate=&sampRate.
reps=100
out=_sample
seed=&sample
outall;
strata stage;*ensures stage is present for all values - not needed in prod;
run;
*set duration to values if selected;
data _sample;
set _sample;
*assign values as needed;
if selected = 1 and method=0 then dur_loop = DUR_MID; else
if selected = 0 and method=0 then dur_loop = DUR_GOLD; else
if selected=1 and method=1 then dur_loop=DUR_GOLD; else
if selected=0 and method=1 then dur_loop=DUR_GOLD;
run;
*sort for modeling procedures next;
options nonotes;
proc sort data=_sample;
by replicate stage AGEGRP;
run;
ods output ParameterEstimates=PE;
PROC PHREG DATA=_sample;
by replicate stage AGEGRP;
MODEL DUR_LOOP*DEATH(0)=METHOD/RL;
RUN;
DATA PE1; SET PE;
PCT_MISS = &sample/100;
run;
*omit if no replicates;
*if replicates calculates average of HR + STDERR for bootstrap approach;
proc means data=PE1 NWAY N MEAN STD STDERR;
class STAGE AGEGRP pct_miss;
var hazardratio HRLowerCL HRUpperCL;
ods output summary = pe_summary(keep=stage agegrp pct_miss HazardRatio_Mean HRUpperCL_Mean HRLowerCL_Mean);
run;
ods select all;
proc append base=results_combined data=PE1 force nowarn;
run;
proc append base=results_summary data=pe_summary force nowarn;
run;
quit;
proc datasets lib=work nodetails nolist;
delete _SAMPLE Results_combined;
run; quit;
%let sampRate =;
%mend;
%surveyLoop(sample=1);
%surveyLoop(sample=2);
through...
%surveyLoop(sample=100);
@Cruise You added "results_combined" to the proc datasets, why did you do that?
@Cruise wrote:
I see that proc delete makes sense here. My question persists, do you still think that proc append is necessary? Because, PE1 and results_combined had the same number of rows at the end of iterations anyway.
Results_Summary contains the summary statistics output from proc means.
Results_combined contains all the records needed to recreate RESULTS_SUMMARY in case further analysis is needed WITHOUT requiring you to run the macro all over again because that's time intensive.
You also removed the deletion from the PE1 file, which is supposed to be deleted because the data is in results_summary and results_combined. The PE1 file only has data from the last run so you should delete it.
As the name implies PROC APPEND is used to add observations to end of an existing dataset.
The macro is using it to add the results from each call. So after two calls the BASE= datasets have the results from both macro runs, while the DATA= datsets have the data from just the last run.
Thank you your explanation. Both pair datasets had the same number of rows. Maybe because PE1 and pe_summary were also updated and added with rows at each iterations of the simulation. My question is, if PE1 and pe_summary were updated to cumulative n of rows then why would we need proc append as an additional step?
But i'll check again when I get back to computer with SAS.
These two procs seems to be cancelling each other out.
proc append base=results_combined data=PE1 force nowarn;
run;
...
proc datasets lib=work nodetails nolist;
delete _SAMPLE Results_combined;
run; quit;
So you seem to be using PROC APPEND to store a copy of PE1 into RESULTS_COMBINED , but then you immediately delete it.
Not sure why it is using PROC DATASETS instead of the simpler, faster PROC DELETE. You probably want replace that PROC DATASETS with:
proc delete data=_SAMPLE PE PE1 PE_SUMMARY;
run;
@Cruise You added "results_combined" to the proc datasets, why did you do that?
@Cruise wrote:
I see that proc delete makes sense here. My question persists, do you still think that proc append is necessary? Because, PE1 and results_combined had the same number of rows at the end of iterations anyway.
Results_Summary contains the summary statistics output from proc means.
Results_combined contains all the records needed to recreate RESULTS_SUMMARY in case further analysis is needed WITHOUT requiring you to run the macro all over again because that's time intensive.
You also removed the deletion from the PE1 file, which is supposed to be deleted because the data is in results_summary and results_combined. The PE1 file only has data from the last run so you should delete it.
Seems like you already know the basics of proc append.
I guess my question is that, by saying the 'dataset pairs are identical', do you mean they have the same structure (i.e. dimension, column names, and data type) or that they have the exact structure AND values for the rows?
The proc append procedure will join the data (specified in the data option) vertically to the bottom of base data (specified in the base option in the proc append). The force option comes into play in instances where the base data contain fewer variables than the data that are being added.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.