Hello friends!
I was hoping someone could help me think through this programming question. I have a dataset that looks like this:
data my_data;
input id option1 option2 option3 option 4;
cards;
A 1 0 1 1
B 1 1 1 1
C 0 1 1 1
D 0 1 0 1
E 1 1 1 1
F 0 0 1 1;
run;
What I need to do is calculate a specific proportion for EACH pairwise combination. The proportion would essentially be the: (# of individuals in BOTH options) / (# of individuals in the first option). For example, I would need the calculation for Option1_Option2 to be: # option 1 AND option 2 / # option 1 = 1/3. Similarly, Option2_Option1 would be: # option 1 AND option 2 / # option 2 = 1/4. And so on...
I need the resulting table output to look like this:
Option1_Option2 2/3
Option1_Option3 3/3
Option1_Option4 3/3
Option2_Option1 2/4
Option2_Option3 3/4
Option2_Option4 4/4
Option3_Option1 3/5
Option3_Option2 3/5
Option3_Option4 5/5
Option4_Option1 3/6
Option4_Option2 4/6
Option4_Option3 5/6
Where I get tripped up in the coding is the pairwise combinations that I need to look at. Any advice or help would be hugely appreciated! Thank you in advance!
This should do it:
data my_data;
input id $ option1 option2 option3 option4;
cards;
A 1 0 1 1
B 1 1 1 1
C 0 1 1 1
D 0 1 0 1
E 1 1 1 1
F 0 0 1 1
;
proc transpose data=my_data out=have;
var option:;
by id;
run;
proc sql;
create table want as
select
catx("-", a._name_, b._name_) as combination,
sum(a.col1*b.col1) / coalesce(sum(a.col1), 1) as proportion
from
have as a inner join
have as b on a.id=b.id and a._name_ ne b._name_
group by a._name_, b._name_;
quit;
A well formulated question.
However, it would seem to me that Option1_Option2 should be 2/3 (AE)/(ABE) right?
Yes! Sorry about that! I played around with the mock dataset a few times and didn't catch that change!
This should do it:
data my_data;
input id $ option1 option2 option3 option4;
cards;
A 1 0 1 1
B 1 1 1 1
C 0 1 1 1
D 0 1 0 1
E 1 1 1 1
F 0 0 1 1
;
proc transpose data=my_data out=have;
var option:;
by id;
run;
proc sql;
create table want as
select
catx("-", a._name_, b._name_) as combination,
sum(a.col1*b.col1) / coalesce(sum(a.col1), 1) as proportion
from
have as a inner join
have as b on a.id=b.id and a._name_ ne b._name_
group by a._name_, b._name_;
quit;
@PGStats this is brilliant!! Thank you so much! You just saved me so much time 🙂
Hi @PGStats , thank you again for your help! I have one more question that would help me automate things. I also have one more identifier that I want to account for (the "type" variable below).
data my_data;
input id $ type option1 option2 option3 option4;
cards;
A 1 1 0 1 1
B 2 1 1 1 1
C 1 0 1 1 1
D 2 0 1 0 1
E 1 1 1 1 1
F 2 0 0 1 1
;
Currently, the code you generously provided includes the proportion calculation for types 1 AND 2. Is there an easy way to additionally add a column for the proportion calculation for type 1 only? And then a column for the proportion calculation for type 2 only?
I'm not very familiar with using proc sql to generate tables, so any general resources to help me learn would also be greatly appreciated!
Thank you, again!
You can try the below code , i updated the @PGStats code
proc sql;
create table want as
select
catx("-", a._name_, b._name_) as combination,
sum(a.col1*b.col1) / coalesce(sum(a.col1), 1) as proportion, sum(a.col1*b.col1) as type1, coalesce(sum(a.col1), 1) as type2
from
have as a inner join
have as b on a.id=b.id and a._name_ ne b._name_
group by a._name_, b._name_;
quit;
Thanks @Jagadishkatam .Currently, your code is giving me additional columns for the numerator of the "combination" variable (variable=type1) and the denominator of the "combination" variable (variable=type2).
What I'm looking for is a similar "combination" calculation variable for instances for the individuals who are type=1. And then a similar "combination" calculation variable for the individuals who are type=2. Is there a way to add an if/then or when statement to proc sql to do so? For example, the table output for the dataset shown below would look like this:
Dataset:
data my_data;
input id $ type option1 option2 option3 option4;
cards;
A 1 1 0 1 1
B 2 1 1 1 1
C 1 0 1 1 1
D 2 0 1 0 1
E 1 1 1 1 1
F 2 0 0 1 1
;
Output table:
Option1-Option2 2/3 1/2 1/1
Option1-Option3 3/3 2/2 1/1
Option1-Option4 3/3 2/2 1/1
Option2-Option1 2/4 1/2 1/2
etc.
Here are two methods for obtaining proportions by type, assuming that ids are unique keys:
data my_data;
input id $ type option1 option2 option3 option4;
cards;
A 1 1 0 1 1
B 2 1 1 1 1
C 1 0 1 1 1
D 2 0 1 0 1
E 1 1 1 1 1
F 2 0 0 1 1
;
proc transpose data=my_data out=have;
var option:;
by id type;
run;
proc sql;
create table want1 as
select
catx("-", a._name_, b._name_) as combination,
sum(a.col1 and b.col1) / coalesce(sum(a.col1), 1) as proportion,
sum(a.type=1 and a.col1 and b.col1) / coalesce(sum(a.type=1 and a.col1), 1) as proportionType1,
sum(a.type=2 and a.col1 and b.col1) / coalesce(sum(a.type=2 and a.col1), 1) as proportionType2
from
have as a inner join
have as b on a.id=b.id and a._name_ ne b._name_
group by a._name_, b._name_;
quit;
proc sql;
create table want2 as
select aa.combination, proportion, proportionType1, proportionType2 from
(select
catx("-", a._name_, b._name_) as combination,
sum(a.col1 and b.col1) / coalesce(sum(a.col1), 1) as proportion
from
have as a inner join
have as b on a.id=b.id and a._name_ ne b._name_
group by a._name_, b._name_) as aa left join
(select
catx("-", a._name_, b._name_) as combination,
sum(a.col1 and b.col1) / coalesce(sum(a.col1), 1) as proportionType1
from
have as a inner join
have as b on a.id=b.id and a._name_ ne b._name_
where a.type = 1
group by a._name_, b._name_) as bb on aa.combination = bb.combination left join
(select
catx("-", a._name_, b._name_) as combination,
sum(a.col1 and b.col1) / coalesce(sum(a.col1), 1) as proportionType2
from
have as a inner join
have as b on a.id=b.id and a._name_ ne b._name_
where a.type = 2
group by a._name_, b._name_) as cc on aa.combination = cc.combination
;
quit;
To scale this up to, say, 20 types, you would need a different approach involving an extra transposition.
data my_data; input id $ option1 option2 option3 option4; cards; A 1 0 1 1 B 1 1 1 1 C 0 1 1 1 D 0 1 0 1 E 1 1 1 1 F 0 0 1 1 ; proc corr data=my_data out=have(where=(_type_='SSCP')) sscp noprint; run; proc transpose data=have(where=(_name_ ne 'Intercept')) out=have2(index=(name)) name=name; by _name_; var option: ; run; proc summary data=my_data ; var option: ; output out=total sum=; run; proc transpose data=total out=total2 name=name ; var option: ; run; data want; merge have2 total2(rename=(col1=col2)); by name; want=col1/col2; run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.