Hello.
I have a problem in calculation ratio for diffierent values in a variable by group.
For example, I have a dataset like this.
id | date | name | var1 |
1 | 161231 | a | x |
1 | 161231 | a | y |
1 | 161231 | b | y |
1 | 161231 | c | x |
1 | 161231 | c | y |
1 | 161231 | d | z |
2 | 161231 | e | x |
2 | 161231 | e | y |
2 | 161231 | f | x |
2 | 161231 | f | z |
2 | 161231 | g | z |
I want to calculate (number of a value in var1)/(number of distinct names) by id-date groups.
For example, for id 1 - date 161231 group, I want x_ratio=2/4 y_ratio=3/4 z_ratio=1/4.
Also, for id 2 - date 161231 group, I want x_ratio=2/3 y_ratio=1/3 z_ratio=2/3.
If you have any suggestion, please share it with me.
Thank you.
One option, plus the code for transpose in case you want the ratios in separate columns
proc sql;
create table want as
select a.id,a.date,var1,nvalInVar1 ,nDistinctNames,nvalInVar1/nDistinctNames as Ratio
from (select id,date,var1,count (*) as nvalInVar1 from have group by id,date,var1) a
join (select id,date,count(distinct(name)) as nDistinctNames from have group by id,date) b
on a.id=b.id and a.date=b.date;
proc transpose data=want out=want2 (drop=_name_) prefix=ratio_;
by id date;
id var1 ;
var ratio;
run;
I can't understand how when id=1, z_ratio=1/4 from your data. You need to explain in more detail how these ratios are computed.
I don't have time to solve this right now, but I can explain where the numbers come from.
The denominator is 4 because there are 4 distinct NAME values for the combination of ID and DATE.
The numerator is 1 because there is only 1 observation for the combination of ID, DATE, and VAR1.
One option, plus the code for transpose in case you want the ratios in separate columns
proc sql;
create table want as
select a.id,a.date,var1,nvalInVar1 ,nDistinctNames,nvalInVar1/nDistinctNames as Ratio
from (select id,date,var1,count (*) as nvalInVar1 from have group by id,date,var1) a
join (select id,date,count(distinct(name)) as nDistinctNames from have group by id,date) b
on a.id=b.id and a.date=b.date;
proc transpose data=want out=want2 (drop=_name_) prefix=ratio_;
by id date;
id var1 ;
var ratio;
run;
data have;
input id date name $ var1 $;
datalines;
1 161231 a x
1 161231 a y
1 161231 b y
1 161231 c x
1 161231 c y
1 161231 d z
2 161231 e x
2 161231 e y
2 161231 f x
2 161231 f z
2 161231 g z
;
proc freq data=have noprint ;
by id date;
tables var1/out=temp(drop=percent) ;
run;
proc sql;
create table want as
select a.*,b.cnt, a.count/b.cnt as ratio
from
temp a left join (select id, date,count(distinct name) as cnt from have group by id, date) b
on a.id=b.id and a.date=b.date;
quit;
Everyone, thanks for sharing your knowledge.
I can't thank you enough. It helped a lot. 🙂
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.