BookmarkSubscribeRSS Feed
UniversitySas
Quartz | Level 8

So let's say I have a macro that loops through 2 iterations, and at the end of each iteration my output is of the form:

 

Iteration 1 output:

data Example;
   input Variable $ Var1 Var2 Var3;
   datalines;
Mean 195  163   100
Max 220  198   240
;
run;

 

Iteration 2 output:

data Example2;
   input Variable $ Var1 Var2 Var3;
   datalines;
Mean 25  125   500
Max 210  141   630
;
run;

 

Inside, my macro looks like this:

 

PROC datasets lib = work nolist; DELETE Example;
%MACRO output_table(y);
/* 
bla bla bla insert code
*/
PROC MEANS DATA = Example;
        var Var1 Var2 Var3;
        output out = Example&y;
QUIT;

PROC append base = Example data = Example&y;
%MEND output_table;

%MACRO iter_1_2;
    %Do y = 1 %TO 2;
       %output_table(&y);
    %End;
%MEND iter_1_2;

RUN;

 

My end result is just the tables concatenated, for instance:

data Example_append;
   input Variable $ Var1 Var2 Var3;
   datalines;
Mean 195  163   100
Max 220  198   240
Mean 25  125   500
Max 210  141   630
;
run;

But what I really want is the summarized values of all the variables from all the tables. So in this case:

For the Mean of Var1-Var3, I'd have: (195+25)/2 = 110, (163+125)/2=144, (100+500)/2 = 300

and for max, I'd have: max(220,210)=220, max(198,141)=198, max(240,630)=630 and my output would look like:

 

data Example_append;
   input Variable $ Var1 Var2 Var3;
   datalines;
Mean 110 144   300
Max  220 198   630
;
run;

I'm unsure what I'm doing wrong and how to get there. Is it a PROC means step on my appended data, after a transpose? I can't transpose them because of the repeat Vars in the rows.

 

Any help is appreciated. Apologies for any typos.

 

12 REPLIES 12
Reeza
Super User
I'm confused. Do you have working code that works on a single loop that gives you the results or are you trying to modify the PROC APPEND only. AFAIK that isn't the default output from PROC MEANS that you've shown and you also don't have the Example data set since you've deleted it (first line of code) and you have no way of identifying the output from each loop independently to transpose.

I get that you've likely tried to simplify the code for the forum but its not reflective of whatever issue you're having and it's not clear what the issue is.

UniversitySas
Quartz | Level 8

Yes that's correct. My first loop comes out correctly. I understand the PROC means output also includes N, MIN, MAX, MEAN, STDEV, but I omitted it for simplicity.

 

The gist of what I am trying to do is once the loop is completed, the final table is summarized for each variable, within the same row. So if every iteration outputs a "Max" and "Mean" row, with the columns as Var1-Var3, then the #rows = 2 x #loops (1 loops = 2 rows, 2 loops = 4 rows etc), I instead want it so that after all my loops I only have 2 rows of data, regardless of loops. All the data are summarized in those two rows. 

 

 

 

Reeza
Super User

I'm not following, how are you planning to collapse those rows? You haven't shown any desired output.

Are you then trying to take the average of averages? That's generally not a good idea at all and you haven't specified the logic of how you'd want to collapse them. 

 

If you wanted to do average of averages, I would recommend factoring in the size of each group at least since a group with N=5 will have a different distrubiton than N=100 and considering them equivalent would be unfair. 

 

proc means data=appended MEAN; 
class STAT;
var var1-var3;
output out=want mean=; run;

 


@UniversitySas wrote:

 

 

The gist of what I am trying to do is once the loop is completed, the final table is summarized for each variable, within the same row. So if every iteration outputs a "Max" and "Mean" row, with the columns as Var1-Var3, then the #rows = 2 x #loops (1 loops = 2 rows, 2 loops = 4 rows etc), I instead want it so that after all my loops I only have 2 rows of data, regardless of loops. All the data are summarized in those two rows. 

 

 

 


 

UniversitySas
Quartz | Level 8

I was hoping to get an output like this:

 

data Wanted_output;
   input Variable $ Var1 Var2 Var3;
   datalines;
Mean 110 144   300
Max  220 198   630
;
run;

Where the mean corresponds to the means across all the different loops, similarly the max is now the max of all the different groups.

Is PROC Append not ideal for solving this?

PaigeMiller
Diamond | Level 26

 


@UniversitySas wrote:

I was hoping to get an output like this:

 

data Wanted_output;
   input Variable $ Var1 Var2 Var3;
   datalines;
Mean 110 144   300
Max  220 198   630
;
run;

Where the mean corresponds to the means across all the different loops, similarly the max is now the max of all the different groups.

Is PROC Append not ideal for solving this?


 

Why not a loop like this pseudo-code:

 

%do i = 1 %to 2

     PROC APPEND

%end

PROC MEANS

 

This PROC MEANS gets you the final output you said you wanted, mean of all groups, max of all groups

--
Paige Miller
Reeza
Super User
You can take the max of maximums to get an overall max, but you cannot take the mean of means to get your overall mean. So you need to be extra clear about how you want this calculations to happen.
PaigeMiller
Diamond | Level 26

@Reeza makes a good point. At the start of this thread, @UniversitySas, you were very clear that you wanted the mean of the means; but now you seem to be talking about the mean of the data, which is not necessarily the same thing. So we need a lot more clarity here; and could you please discuss the calculations needed without confusing the issue by talking about some looping algorithm, as the algorithm in code should be thought of independently from the desired results.

--
Paige Miller
Reeza
Super User

I have an example of appending with reporting here, including the PROC MEANS portion. 

https://gist.github.com/statgeek/353374a5d8ea4f0c89ce5d80a47f4a4c

 

 

UniversitySas
Quartz | Level 8
Thanks! I'll check it out and see if it answers my question.
UniversitySas
Quartz | Level 8

Just an update - I managed to solve my problem.

 

The issue in my code was here:

PROC MEANS DATA = Example;
        var Var1 Var2 Var3;
        output out = Example&y;
QUIT;

What I did instead was:

PROC datasets ; DELETE temp;
%MACRO output_table(y);
/* 
bla bla bla insert code
*/
PROC MEANS DATA = Example;
        var Var1 Var2 Var3;
        output out = Example;
QUIT;

PROC append base = temp data = Example;
%MEND output_table;

%MACRO iter_1_2;
    %Do y = 1 %TO 2;
       %output_table(&y);
    %End;
%MEND iter_1_2;

%iter_1_2;

PROC MEANS DATA = temp;
      var Var1 Var2 Var3;
RUN;

This seems to work!

Thanks again for all your help, sorry for the confusion in what I was trying to do.

PaigeMiller
Diamond | Level 26

Now let me say I don't think looping is really necessary, and that you shouldn't automatically decide that a loopong solution is necessary. It seems to me that you could have used BY statements to accomplish this, and there would be no need for looping or PROC APPEND, and the code is a lot easier to get right.

 
 
--
Paige Miller
Reeza
Super User
As long as you're aware that the mean you've calculated doesn't have a lot of meaning you're good I guess. It's not the mean of the data over the set of datasets, it's the average of the averages which has no definition that I'm aware of. If by luck, each data set has the exact same number of records you're good.

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 12 replies
  • 1161 views
  • 2 likes
  • 3 in conversation