Solved: Re: Proportion calculation for each pairwise combination

telc24 · Posted 07-04-2020 10:38 PM

Hello friends!

I was hoping someone could help me think through this programming question. I have a dataset that looks like this:

data my_data;
input id option1 option2 option3 option 4;
cards;
A 1 0 1 1

B 1 1 1 1

C 0 1 1 1

D 0 1 0 1

E 1 1 1 1

F 0 0 1 1;
run;

What I need to do is calculate a specific proportion for EACH pairwise combination. The proportion would essentially be the: (# of individuals in BOTH options) / (# of individuals in the first option). For example, I would need the calculation for Option1_Option2 to be: # option 1 AND option 2 / # option 1 = 1/3. Similarly, Option2_Option1 would be: # option 1 AND option 2 / # option 2 = 1/4. And so on...

I need the resulting table output to look like this:

Option1_Option2 2/3

Option1_Option3 3/3

Option1_Option4 3/3

Option2_Option1 2/4

Option2_Option3 3/4

Option2_Option4 4/4

Option3_Option1 3/5

Option3_Option2 3/5

Option3_Option4 5/5

Option4_Option1 3/6

Option4_Option2 4/6

Option4_Option3 5/6

Where I get tripped up in the coding is the pairwise combinations that I need to look at. Any advice or help would be hugely appreciated! Thank you in advance!

PGStats · Posted 07-04-2020 11:18 PM

This should do it:

data my_data;
input id $ option1 option2 option3 option4;
cards;
A 1 0 1 1
B 1 1 1 1
C 0 1 1 1 
D 0 1 0 1
E 1 1 1 1
F 0 0 1 1
;

proc transpose data=my_data out=have;
var option:;
by id;
run;

proc sql;
create table want as
select 
    catx("-", a._name_, b._name_) as combination,
    sum(a.col1*b.col1) / coalesce(sum(a.col1), 1) as proportion
from 
    have as a inner join
    have as b on a.id=b.id and a._name_ ne b._name_
group by a._name_, b._name_;
quit;

PG

View solution in original post

PGStats · Posted 07-04-2020 11:04 PM

A well formulated question.

However, it would seem to me that Option1_Option2 should be 2/3 (AE)/(ABE) right?

PG

telc24 · Posted 07-04-2020 11:06 PM

Yes! Sorry about that! I played around with the mock dataset a few times and didn't catch that change!

PGStats · Posted 07-04-2020 11:18 PM

This should do it:

data my_data;
input id $ option1 option2 option3 option4;
cards;
A 1 0 1 1
B 1 1 1 1
C 0 1 1 1 
D 0 1 0 1
E 1 1 1 1
F 0 0 1 1
;

proc transpose data=my_data out=have;
var option:;
by id;
run;

proc sql;
create table want as
select 
    catx("-", a._name_, b._name_) as combination,
    sum(a.col1*b.col1) / coalesce(sum(a.col1), 1) as proportion
from 
    have as a inner join
    have as b on a.id=b.id and a._name_ ne b._name_
group by a._name_, b._name_;
quit;

PG

telc24 · Posted 07-04-2020 11:33 PM

@PGStats this is brilliant!! Thank you so much! You just saved me so much time 🙂

telc24 · Posted 07-05-2020 12:00 AM

Hi @PGStats , thank you again for your help! I have one more question that would help me automate things. I also have one more identifier that I want to account for (the "type" variable below).

data my_data;

input id $ type option1 option2 option3 option4;

cards;

A 1 1 0 1 1

B 2 1 1 1 1

C 1 0 1 1 1

D 2 0 1 0 1

E 1 1 1 1 1

F 2 0 0 1 1

;

Currently, the code you generously provided includes the proportion calculation for types 1 AND 2. Is there an easy way to additionally add a column for the proportion calculation for type 1 only? And then a column for the proportion calculation for type 2 only?

I'm not very familiar with using proc sql to generate tables, so any general resources to help me learn would also be greatly appreciated!

Thank you, again!

Jagadishkatam · Posted 07-05-2020 03:01 AM

You can try the below code , i updated the @PGStats code

proc sql;
create table want as
select 
    catx("-", a._name_, b._name_) as combination,
    sum(a.col1*b.col1) / coalesce(sum(a.col1), 1) as proportion, sum(a.col1*b.col1) as type1, coalesce(sum(a.col1), 1) as type2
from 
    have as a inner join
    have as b on a.id=b.id and a._name_ ne b._name_
group by a._name_, b._name_;
quit;

Thanks,
Jag

telc24 · Posted 07-05-2020 08:56 AM

Thanks @Jagadishkatam .Currently, your code is giving me additional columns for the numerator of the "combination" variable (variable=type1) and the denominator of the "combination" variable (variable=type2).

What I'm looking for is a similar "combination" calculation variable for instances for the individuals who are type=1. And then a similar "combination" calculation variable for the individuals who are type=2. Is there a way to add an if/then or when statement to proc sql to do so? For example, the table output for the dataset shown below would look like this:

Dataset:

data my_data;

input id $ type option1 option2 option3 option4;

cards;

A 1 1 0 1 1

B 2 1 1 1 1

C 1 0 1 1 1

D 2 0 1 0 1

E 1 1 1 1 1

F 2 0 0 1 1

;

Output table:

Option1-Option2 2/3 1/2 1/1

Option1-Option3 3/3 2/2 1/1

Option1-Option4 3/3 2/2 1/1

Option2-Option1 2/4 1/2 1/2

etc.

PGStats · Posted 07-05-2020 05:35 PM

Here are two methods for obtaining proportions by type, assuming that ids are unique keys:

data my_data;
input id $ type option1 option2 option3 option4;
cards;
A 1 1 0 1 1
B 2 1 1 1 1
C 1 0 1 1 1 
D 2 0 1 0 1
E 1 1 1 1 1
F 2 0 0 1 1
;

proc transpose data=my_data out=have;
var option:;
by id type;
run;

proc sql;
create table want1 as
select 
    catx("-", a._name_, b._name_) as combination,
    sum(a.col1 and b.col1) / coalesce(sum(a.col1), 1) as proportion,
    sum(a.type=1 and a.col1 and b.col1) / coalesce(sum(a.type=1 and a.col1), 1) as proportionType1,
    sum(a.type=2 and a.col1 and b.col1) / coalesce(sum(a.type=2 and a.col1), 1) as proportionType2
from 
    have as a inner join
    have as b on a.id=b.id and a._name_ ne b._name_
group by a._name_, b._name_;
quit;


proc sql;
create table want2 as
select aa.combination, proportion, proportionType1, proportionType2 from

   (select 
        catx("-", a._name_, b._name_) as combination,
        sum(a.col1 and b.col1) / coalesce(sum(a.col1), 1) as proportion
    from 
        have as a inner join
        have as b on a.id=b.id and a._name_ ne b._name_
    group by a._name_, b._name_) as aa left join

   (select 
        catx("-", a._name_, b._name_) as combination,
        sum(a.col1 and b.col1) / coalesce(sum(a.col1), 1) as proportionType1
    from 
        have as a inner join
        have as b on a.id=b.id and a._name_ ne b._name_
    where a.type = 1
    group by a._name_, b._name_) as bb on aa.combination = bb.combination left join

   (select 
        catx("-", a._name_, b._name_) as combination,
        sum(a.col1 and b.col1) / coalesce(sum(a.col1), 1) as proportionType2
    from 
        have as a inner join
        have as b on a.id=b.id and a._name_ ne b._name_
    where a.type = 2
    group by a._name_, b._name_) as cc on aa.combination = cc.combination
    ;
quit;

To scale this up to, say, 20 types, you would need a different approach involving an extra transposition.

PG

Ksharp · Posted 07-05-2020 08:04 AM

data my_data;
input id $ option1 option2 option3 option4;
cards;
A 1 0 1 1
B 1 1 1 1
C 0 1 1 1 
D 0 1 0 1
E 1 1 1 1
F 0 0 1 1
;

proc corr data=my_data out=have(where=(_type_='SSCP'))  sscp noprint;
run;
proc transpose data=have(where=(_name_ ne 'Intercept')) out=have2(index=(name)) name=name;
by _name_;
var option: ;
run;

proc summary data=my_data ;
var option: ;
output out=total sum=;
run;
proc transpose data=total out=total2 name=name ;
var option: ;
run;

data want;
 merge have2 total2(rename=(col1=col2));
 by name;
 want=col1/col2;
run;

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away