I have the following data, can you please help me to get the proportion of smokers in each cluster
data x;
input ID cluster smoke ;
cards;
1 1 1
2 1 1
3 1 1
4 2 0
5 2 1
6 2 0
7 2 1
8 3 1
9 3 0
;run;
Expected outcome
ID cluster smoke Proportion of Smokers in each cluster
1 1 1 100
2 1 1 100
3 1 1 100
4 2 0 50
5 2 1 50
6 2 0 50
7 2 1 50
8 3 1 66
9 3 0 66
10 3 . 1 66
proc summary data=x nway;
var smoke;
class cluster;
output out=_stats_(drop=_:) mean=proportion;
run;
data want;
merge x _stats_;
by cluster;
run;
proc summary data=x nway;
var smoke;
class cluster;
output out=_stats_(drop=_:) mean=proportion;
run;
data want;
merge x _stats_;
by cluster;
run;
Hi @Deepai
You can use a Proc SQL for example:
proc sql;
select *, sum(smoke)/count(smoke) format=percent8.2 as percent
from x
group by cluster;
run;
Best,
Hi @Deepai Since your smoke are binary values i.e. 1s and 0s , the mean function suggested by @PaigeMiller is pretty elegant.
The logic build basically entails i. computation of proportion by count=true smokers/count all, then ii. Assign the computed column value to each record of the BY Group cluster.
The SQL solution considering you have more than one grouping variable does an automatic remerge of the computed summary statistic with each record of a BY Group. So copying Paige's idea in SQL and Percent format will give you the desired result
data x;
input ID cluster smoke ;
cards;
1 1 1
2 1 1
3 1 1
4 2 0
5 2 1
6 2 0
7 2 1
8 3 1
9 3 0
;
run;
proc sql;
create table want as
select *, mean(smoke) as proportion format=percent.
from x
group by cluster
order by cluster, id ;
quit;
Mind you, SQL output order is never guaranteed as the SQL optimizer uses varying algorithms in its inner query depending on the table size, number of variables, presence of indexes , memory availability etc. Hence, it's always recommended to use an ORDER By clause should you require the result to be in a particular order.
Hope this helps,
Kind Regards
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.