Solved: calculating proportion by cluster

Deepai · Posted 02-08-2020 11:43 AM

I have the following data, can you please help me to get the proportion of smokers in each cluster

data x;
input ID cluster smoke ;

cards;
1 1 1
2 1 1
3 1 1
4 2 0
5 2 1
6 2 0
7 2 1
8 3 1
9 3 0

;run;

Expected outcome

ID cluster smoke Proportion of Smokers in each cluster

1 1 1 100
2 1 1 100
3 1 1 100
4 2 0 50
5 2 1 50
6 2 0 50
7 2 1 50
8 3 1 66
9 3 0 66

10 3 . 1 66

PaigeMiller · Posted 02-08-2020 11:47 AM

proc summary data=x nway;
    var smoke;
    class cluster;
    output out=_stats_(drop=_:) mean=proportion;
run;
data want;
    merge x _stats_;
    by cluster;
run;

--
Paige Miller

View solution in original post

PaigeMiller · Posted 02-08-2020 11:47 AM

proc summary data=x nway;
    var smoke;
    class cluster;
    output out=_stats_(drop=_:) mean=proportion;
run;
data want;
    merge x _stats_;
    by cluster;
run;

--
Paige Miller

ed_sas_member · Posted 02-08-2020 11:51 AM

Hi @Deepai

You can use a Proc SQL for example:

proc sql;
	select *, sum(smoke)/count(smoke) format=percent8.2 as percent
	from x
	group by cluster;
run;

Best,

novinosrin · Posted 02-08-2020 12:34 PM

Hi @Deepai Since your smoke are binary values i.e. 1s and 0s , the mean function suggested by @PaigeMiller is pretty elegant.

The logic build basically entails i. computation of proportion by count=true smokers/count all, then ii. Assign the computed column value to each record of the BY Group cluster.

The SQL solution considering you have more than one grouping variable does an automatic remerge of the computed summary statistic with each record of a BY Group. So copying Paige's idea in SQL and Percent format will give you the desired result

data x;
input ID cluster smoke ;
cards;
1       1       1
2       1       1
3       1       1
4       2       0
5       2       1
6       2       0
7       2       1
8       3       1
9       3        0
;
run;

proc sql;
   create table want as
	select *, mean(smoke)  as proportion format=percent. 
	from x
	group by cluster
	order by cluster, id ;
quit;

Mind you, SQL output order is never guaranteed as the SQL optimizer uses varying algorithms in its inner query depending on the table size, number of variables, presence of indexes , memory availability etc. Hence, it's always recommended to use an ORDER By clause should you require the result to be in a particular order.

Hope this helps,

Kind Regards

calculating proportion by cluster

Re: calculating proportion by cluster

Re: calculating proportion by cluster

Re: calculating proportion by cluster

Re: calculating proportion by cluster

calculating proportion by cluster

Re: calculating proportion by cluster

Re: calculating proportion by cluster

Re: calculating proportion by cluster

Re: calculating proportion by cluster

Registration is open

SAS Training: Just a Click Away