Hello.
I have a problem in calculation ratio for diffierent values in a variable by group.
For example, I have a dataset like this.
id | date | name | var1 |
1 | 161231 | a | x |
1 | 161231 | a | y |
1 | 161231 | b | y |
1 | 161231 | c | x |
1 | 161231 | c | y |
1 | 161231 | d | z |
2 | 161231 | e | x |
2 | 161231 | e | y |
2 | 161231 | f | x |
2 | 161231 | f | z |
2 | 161231 | g | z |
I want to calculate (number of a value in var1)/(number of distinct names) by id-date groups.
For example, for id 1 - date 161231 group, I want x_ratio=2/4 y_ratio=3/4 z_ratio=1/4.
Also, for id 2 - date 161231 group, I want x_ratio=2/3 y_ratio=1/3 z_ratio=2/3.
If you have any suggestion, please share it with me.
Thank you.
One option, plus the code for transpose in case you want the ratios in separate columns
proc sql;
create table want as
select a.id,a.date,var1,nvalInVar1 ,nDistinctNames,nvalInVar1/nDistinctNames as Ratio
from (select id,date,var1,count (*) as nvalInVar1 from have group by id,date,var1) a
join (select id,date,count(distinct(name)) as nDistinctNames from have group by id,date) b
on a.id=b.id and a.date=b.date;
proc transpose data=want out=want2 (drop=_name_) prefix=ratio_;
by id date;
id var1 ;
var ratio;
run;
I can't understand how when id=1, z_ratio=1/4 from your data. You need to explain in more detail how these ratios are computed.
I don't have time to solve this right now, but I can explain where the numbers come from.
The denominator is 4 because there are 4 distinct NAME values for the combination of ID and DATE.
The numerator is 1 because there is only 1 observation for the combination of ID, DATE, and VAR1.
One option, plus the code for transpose in case you want the ratios in separate columns
proc sql;
create table want as
select a.id,a.date,var1,nvalInVar1 ,nDistinctNames,nvalInVar1/nDistinctNames as Ratio
from (select id,date,var1,count (*) as nvalInVar1 from have group by id,date,var1) a
join (select id,date,count(distinct(name)) as nDistinctNames from have group by id,date) b
on a.id=b.id and a.date=b.date;
proc transpose data=want out=want2 (drop=_name_) prefix=ratio_;
by id date;
id var1 ;
var ratio;
run;
data have;
input id date name $ var1 $;
datalines;
1 161231 a x
1 161231 a y
1 161231 b y
1 161231 c x
1 161231 c y
1 161231 d z
2 161231 e x
2 161231 e y
2 161231 f x
2 161231 f z
2 161231 g z
;
proc freq data=have noprint ;
by id date;
tables var1/out=temp(drop=percent) ;
run;
proc sql;
create table want as
select a.*,b.cnt, a.count/b.cnt as ratio
from
temp a left join (select id, date,count(distinct name) as cnt from have group by id, date) b
on a.id=b.id and a.date=b.date;
quit;
Everyone, thanks for sharing your knowledge.
I can't thank you enough. It helped a lot. 🙂
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.