BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Deepai
Calcite | Level 5

I have the following data, can you please help me to get the proportion of smokers in each cluster

 

 

data x;
input ID cluster smoke ;

cards;
1       1       1
2       1       1
3       1       1
4       2       0
5       2       1
6       2       0
7       2       1
8       3       1
9       3        0

;run;

 

Expected outcome

 

ID   cluster  smoke  Proportion of Smokers in each cluster


1       1       1                100
2       1       1                100
3       1       1                100
4       2       0                50
5       2       1                50
6       2       0                50
7       2       1                50             
8       3       1               66       
9       3        0              66

10     3 .      1              66

1 ACCEPTED SOLUTION

Accepted Solutions
PaigeMiller
Diamond | Level 26
proc summary data=x nway;
    var smoke;
    class cluster;
    output out=_stats_(drop=_:) mean=proportion;
run;
data want;
    merge x _stats_;
    by cluster;
run;
--
Paige Miller

View solution in original post

3 REPLIES 3
PaigeMiller
Diamond | Level 26
proc summary data=x nway;
    var smoke;
    class cluster;
    output out=_stats_(drop=_:) mean=proportion;
run;
data want;
    merge x _stats_;
    by cluster;
run;
--
Paige Miller
ed_sas_member
Meteorite | Level 14

Hi @Deepai 

 

You can use a Proc SQL for example:

proc sql;
	select *, sum(smoke)/count(smoke) format=percent8.2 as percent
	from x
	group by cluster;
run;

Best,

novinosrin
Tourmaline | Level 20

Hi @Deepai   Since your smoke are binary values i.e. 1s and 0s , the mean function suggested by @PaigeMiller  is pretty elegant. 

 

The logic build basically entails i. computation of proportion by count=true smokers/count all, then  ii. Assign the computed column value to each record of the BY Group cluster.

The SQL solution considering you have more than one grouping variable does an automatic remerge of the computed summary statistic with each record of a BY Group. So copying Paige's idea in SQL and Percent format will give you the desired result

 

data x;
input ID cluster smoke ;
cards;
1       1       1
2       1       1
3       1       1
4       2       0
5       2       1
6       2       0
7       2       1
8       3       1
9       3        0
;
run;

proc sql;
   create table want as
	select *, mean(smoke)  as proportion format=percent. 
	from x
	group by cluster
	order by cluster, id ;
quit;

Mind you, SQL output order is never guaranteed as the SQL optimizer uses varying algorithms in its inner query depending on the table size, number of variables, presence of indexes , memory availability etc. Hence, it's always recommended to use an ORDER By clause should you require the result to be in a particular order. 

 

Hope this helps,

Kind Regards

 

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 3 replies
  • 941 views
  • 2 likes
  • 4 in conversation