Hi @Deepai Since your smoke are binary values i.e. 1s and 0s , the mean function suggested by @PaigeMiller is pretty elegant.
The logic build basically entails i. computation of proportion by count=true smokers/count all, then ii. Assign the computed column value to each record of the BY Group cluster.
The SQL solution considering you have more than one grouping variable does an automatic remerge of the computed summary statistic with each record of a BY Group. So copying Paige's idea in SQL and Percent format will give you the desired result
data x;
input ID cluster smoke ;
cards;
1 1 1
2 1 1
3 1 1
4 2 0
5 2 1
6 2 0
7 2 1
8 3 1
9 3 0
;
run;
proc sql;
create table want as
select *, mean(smoke) as proportion format=percent.
from x
group by cluster
order by cluster, id ;
quit;
Mind you, SQL output order is never guaranteed as the SQL optimizer uses varying algorithms in its inner query depending on the table size, number of variables, presence of indexes , memory availability etc. Hence, it's always recommended to use an ORDER By clause should you require the result to be in a particular order.
Hope this helps,
Kind Regards
... View more