BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Cruise
Ammonite | Level 13

Hi Folks:

 

I'm trying to understand proc append in the given context below. I have a simulation study going on and got the problem solved in terms of executing it in SAS (see link below). The SAS program for simulation included 'proc append' as shown below.

 

proc append base=results_combined data=PE1 force nowarn;
run;
proc append base=results_summary data=pe_summary force nowarn;
run;

My question is, what are being achieved by these two proc appends? I read SAS documentations on proc append and a SUGI paper as well.

 

To understand better, I ran  proc contents as following on the datasets specified after base= and data=. However, the dataset pairs are identical. Is proc append being used here just to change the name of the dataset from PE1 to results_combined and pe_summary to results_summary?

 

Thanks for your help! I appreciate your time.

 

proc contents data=results_combined; run;
proc contents data=PE1; run;

proc contents data=results_summary; run;
proc contents data=pe_summary; run;

 

Previous post where simulation problem was solved by Reeza.

https://communities.sas.com/t5/SAS-Programming/Simulate-parameter-estimates-of-the-model-for-missing...

 

Whole program for a simulation taking from the link of the post above.

 


%symdel; 
%macro surveyLoop(sample=);
*calculate sample rate; %let sampRate = %sysevalf(&sample./100);
*create sample;
ods select none;
proc surveyselect data=method
                   samprate=&sampRate. 
                   reps=100  
                   out=_sample 
                   seed=&sample
                   outall;
strata stage;*ensures stage is present for all values - not needed in prod;
run;

*set duration to values if selected;
data _sample;
set _sample;
*assign values as needed;
if selected = 1 and method=0 then dur_loop = DUR_MID; else 
if selected = 0 and method=0 then dur_loop = DUR_GOLD; else 
if selected=1 and method=1 then dur_loop=DUR_GOLD; else
if selected=0 and method=1 then dur_loop=DUR_GOLD; 
run;
*sort for modeling procedures next;
options nonotes; 
proc sort data=_sample;
by replicate stage AGEGRP;
run;

ods output ParameterEstimates=PE;
PROC PHREG DATA=_sample; 
by replicate stage AGEGRP;
MODEL DUR_LOOP*DEATH(0)=METHOD/RL;      
RUN;

DATA PE1; SET PE;
PCT_MISS = &sample/100;
run;

*omit if no replicates;
*if replicates calculates average of HR + STDERR for bootstrap approach;
proc means data=PE1 NWAY N MEAN STD STDERR;
class STAGE AGEGRP pct_miss;
var hazardratio HRLowerCL HRUpperCL;
ods output summary = pe_summary(keep=stage agegrp pct_miss HazardRatio_Mean HRUpperCL_Mean HRLowerCL_Mean);
run;
ods select all;

proc append base=results_combined data=PE1 force nowarn;
run;
proc append base=results_summary data=pe_summary force nowarn;
run;
quit; 
proc datasets lib=work nodetails nolist;
delete _SAMPLE Results_combined;
run; quit;
%let sampRate =;
%mend;

%surveyLoop(sample=1);
%surveyLoop(sample=2);

through...

%surveyLoop(sample=100);


@Reeza 

1 ACCEPTED SOLUTION

Accepted Solutions
Reeza
Super User

@Cruise  You added "results_combined" to the proc datasets, why did you do that?

 


@Cruise wrote:
I see that proc delete makes sense here. My question persists, do you still think that proc append is necessary? Because, PE1 and results_combined had the same number of rows at the end of iterations anyway.

Results_Summary contains the summary statistics output from proc means. 

Results_combined contains all the records needed to recreate RESULTS_SUMMARY in case further analysis is needed WITHOUT requiring you to run the macro all over again because that's time intensive. 

 

You also removed the deletion from the PE1 file, which is supposed to be deleted because the data is in results_summary and results_combined. The PE1 file only has data from the last run so you should delete it. 

 

View solution in original post

8 REPLIES 8
Tom
Super User Tom
Super User

As the name implies PROC APPEND is used to add observations to end of an existing dataset.

The macro is using it to add the results from each call. So after two calls the BASE= datasets have the results from both macro runs, while the DATA= datsets have the data from just the last run.

Cruise
Ammonite | Level 13

@Tom @aaronh 

 

Thank you your explanation. Both pair datasets had the same number of rows. Maybe because PE1 and pe_summary were also updated and added with rows at each iterations of the simulation. My question is, if PE1 and pe_summary were updated to cumulative n of rows then why would we need proc append as an additional step?

 

But i'll check again when I get back to computer with SAS.

Tom
Super User Tom
Super User

These two procs seems to be cancelling each other out.

proc append base=results_combined data=PE1 force nowarn;
run;
...
proc datasets lib=work nodetails nolist;
delete _SAMPLE Results_combined;
run; quit;

So you seem to be using PROC APPEND to store a copy of PE1 into RESULTS_COMBINED , but then you immediately delete it.

Not sure why it is using PROC DATASETS instead of the simpler, faster PROC DELETE.  You probably want replace that PROC DATASETS with:

proc delete data=_SAMPLE PE PE1 PE_SUMMARY;
run;
Cruise
Ammonite | Level 13
I see that proc delete makes sense here. My question persists, do you still think that proc append is necessary? Because, PE1 and results_combined had the same number of rows at the end of iterations anyway.
Tom
Super User Tom
Super User
If you call the macro only once then the two datasets will have same values. But the next time you call the macro the BASE dataset will have the values from both calls.
Reeza
Super User

@Cruise  You added "results_combined" to the proc datasets, why did you do that?

 


@Cruise wrote:
I see that proc delete makes sense here. My question persists, do you still think that proc append is necessary? Because, PE1 and results_combined had the same number of rows at the end of iterations anyway.

Results_Summary contains the summary statistics output from proc means. 

Results_combined contains all the records needed to recreate RESULTS_SUMMARY in case further analysis is needed WITHOUT requiring you to run the macro all over again because that's time intensive. 

 

You also removed the deletion from the PE1 file, which is supposed to be deleted because the data is in results_summary and results_combined. The PE1 file only has data from the last run so you should delete it. 

 

Cruise
Ammonite | Level 13
Ok. Got it. Thanks, Reeza!
aaronh
Quartz | Level 8

Seems like you already know the basics of proc append

 

I guess my question is that, by saying the 'dataset pairs are identical', do you mean they have the same structure (i.e. dimension, column names, and data type) or that they have the exact structure AND values for the rows?

 

The proc append procedure will join the data (specified in the data option) vertically to the bottom of base data (specified in the base option in the proc append). The force option comes into play in instances where the base data contain fewer variables than the data that are being added.

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 8 replies
  • 1971 views
  • 2 likes
  • 4 in conversation