Group means and ungrouped means of multiple variables

Accepted Solution Solved
Reply
Contributor
Posts: 40
Accepted Solution

Group means and ungrouped means of multiple variables

I have a a dataset like this:

The "Val" variables are the ones I want to perform computation on.

id

major_group

minor_group

val1

val2

1

I

a

234

100

2

I

a

435

567

3

I

b

126

56

4

I

b

798

236

5

I

c

324

83

1

II

p

234

100

2

II

q

435

567

3

II

p

126

56

4

II

r

798

236

5

II

q

324

83

I want to add columns to compute the mean

1) by combination of two class variables minor_group & major_group and

2) compute mean by one class variable: major_group

The output I want looks like below. I've added the calculation instead of displaying calculated value for easier explanation:

id

major_group

minor_group

val1

val2

val1_mean_major_minor

val2_mean_major_minor

val1_mean_major

val2_mean_major

1

I

a

234

100

234+435 / 2

100+567/2

234+435+126+798+324 / 5

100+567+56+236+83 / 5

2

I

a

435

567

234+435 / 2

100+567/2

234+435+126+798+324 / 5

100+567+56+236+83 / 5

3

I

b

126

56

126+798/2

56+236/2

234+435+126+798+324 / 5

100+567+56+236+83 / 5

4

I

b

798

236

126+798/2

56+236/2

234+435+126+798+324 / 5

100+567+56+236+83 / 5

5

I

c

324

83

324/1

83/1

234+435+126+798+324 / 5

100+567+56+236+83 / 5

1

II

p

234

100

234+126/2

100+56/2

234+435+126+798+324 / 5

100+567+56+236+83 / 5

2

II

q

435

567

435+324/2

567+83/2

234+435+126+798+324 / 5

100+567+56+236+83 / 5

3

II

p

126

56

234+126/2

100+56/2

234+435+126+798+324 / 5

100+567+56+236+83 / 5

4

II

r

798

236

798/1

236/1

234+435+126+798+324 / 5

100+567+56+236+83 / 5

5

II

q

324

83

435+324/2

567+83/2

234+435+126+798+324 / 5

100+567+56+236+83 / 5

I tried using proc means but have a couple of doubts:

proc means data = have mean noprint classdata = have exclusive;

    var val:;

    class major_group minor_group;

    output out = want;

run;

1) The proc means gives me all the stats I don't want - min, max, range etc. How do I suppress that?

2) The proc means also calculates all combinations of the class variables - major group / minor group / major + minor group. I don't want all combinations. only major+minor group

3) How do I add the calculated columns as columns in the original data set?

thanks.


Accepted Solutions
Solution
‎02-03-2015 01:40 PM
Contributor
Posts: 40

Re: Group means and ungrouped means of multiple variables

Posted in reply to ngnikhilgoyal

1) To supress min, max & other stats and keep only means:

     Ans.) as mentioned by and , use mean = var-name with the WANT dataset and mean statistic in the proc means statement.


2) To select combination of groups by which mean is required:

     Ans.) As mentioned by , add types statement after class

3) To add the calculated columns in the dataset :

     Ans.) Using proc sql :


proc sql noprint;

      create table combined as

      select a.* , b.*

      from have as a LEFT JOIN

            want as b

      ON a.major_group = b.major_group and a.minor_group = b.minor_group;

QUIT;


thanks !

View solution in original post


All Replies
Super User
Posts: 19,815

Re: Group means and ungrouped means of multiple variables

Posted in reply to ngnikhilgoyal

Try adding a statistic to the WANT dataset as well as a TYPES statement.

Untested:

proc means data = have mean noprint classdata = have exclusive;

    var val:;

    class major_group minor_group;

   types major_group major_group*minor_group;

    output out = want mean=average;

run;

Super User
Posts: 11,343

Re: Group means and ungrouped means of multiple variables

Posted in reply to ngnikhilgoyal

1) output out=want mean= /autoname ;

2) select the ways you want them. You should have a type variable and you likely want _type_ = 1 (major_group) and 3 (major and minor group) if I guess right.

I don't have time at the moment to either transpose that output or do the sql to merge the selected bits together. Hint: Left Join

Contributor
Posts: 40

Re: Group means and ungrouped means of multiple variables

Hi ,

Is there a way to not use /autoname and give the variables name in a customized pattern? The name is almost identical to my other variable hence I am not able to use wildcard colon in future operations.

thanks.

Super User
Posts: 19,815

Re: Group means and ungrouped means of multiple variables

Posted in reply to ngnikhilgoyal

See my code.

output out=want mean(var1)=whatever_you_call_it mean(var2)=name2 mean(var3)=name3;

Contributor
Posts: 40

Re: Group means and ungrouped means of multiple variables

Hi

in my original dataset there are 28 variables (and there could be 100s depending on dataset i input) - I don't want to call them or name them manually.

Super User
Posts: 11,343

Re: Group means and ungrouped means of multiple variables

Posted in reply to ngnikhilgoyal

www2.sas.com/proceedings/sugi30/029-30.pdf

Has a number of macros to help manage lists of variables in a variety of ways.

Super User
Posts: 19,815

Re: Group means and ungrouped means of multiple variables

Posted in reply to ngnikhilgoyal

Look at the the STACKODS option instead, your data will come in a different form, long rather than wide that you can transpose.

46427 - STACKODSOUTPUT new for PROC MEANS in SAS 9.3

Solution
‎02-03-2015 01:40 PM
Contributor
Posts: 40

Re: Group means and ungrouped means of multiple variables

Posted in reply to ngnikhilgoyal

1) To supress min, max & other stats and keep only means:

     Ans.) as mentioned by and , use mean = var-name with the WANT dataset and mean statistic in the proc means statement.


2) To select combination of groups by which mean is required:

     Ans.) As mentioned by , add types statement after class

3) To add the calculated columns in the dataset :

     Ans.) Using proc sql :


proc sql noprint;

      create table combined as

      select a.* , b.*

      from have as a LEFT JOIN

            want as b

      ON a.major_group = b.major_group and a.minor_group = b.minor_group;

QUIT;


thanks !

Contributor
Posts: 40

Re: Group means and ungrouped means of multiple variables

Posted in reply to ngnikhilgoyal

To my question about naming the variables in Proc Means, I continued using the auto-name option. But then I used a proc sql code which dynamically renames the variables. This way I modify the name to start differently and I can use the colon wildcard again to operate on the entire group of variables. Thanks and for pointing in right direction.

proc sql noprint;                                                                                                                      

  select trim(name)||'=GROUPMEAN'||substr(name,4)                                                                                            

  into :varlist separated by ' '                                                                                                       

  from DICTIONARY.COLUMNS                                                                                                              

  WHERE LIBNAME EQ "WORK" and MEMNAME EQ "WANT"                                                                                           

  and upcase(name) like 'VAL%';                                                                                                                

quit;

The only glitch is it is based on a "like" value which is fixed - so if the data has completely different variable names with no related pattern - then the &VARLIST will be empty. But thankfully I don't have this problem (right now).

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 9 replies
  • 326 views
  • 8 likes
  • 3 in conversation