BookmarkSubscribeRSS Feed
yanshuai
Quartz | Level 8

Hello,

 

I have asked similar question before but I found I only achieved partial goal.

So I have data like this

DATA HAVE1;
    INPUT (NAME1 NAME2) (:$8.);
    CARDS;
Z J
Z K
X J
X K
;


DATA HAVE2;
    INPUT (NAME X1 X2) (:$8.);
    CARDS;
Z A 1
Z D 2
X C 3
X D 4
J A 1
J B 1
K E 3
K D 4
;

What I want is like this

DATA WANT;
    INPUT (NAME1 NAME2 OVERLAPX1 OVERLAPX1_1 OVERLAPX1_2 OVERLAPX1_3 OVERLAPX2 OVERLAPX2_1 OVERLAPX2_2 OVERLAPX2_3) (:$8.);
    CARDS;
Z J 1 0.5 0.5 1/3 1 0.5 1 0.5
Z K 1 0.5 0.5 1/3 0 0 0 0
X J 0 0 0 0 0 0 0 0
X K 1 0.5 0.5 1/3 2 1 1 1
;

First step, compute the number of overlap between NAME1 and NAME2.

For example (using first row in WANT as example)

So OVERLAPX1 is the number of overlap in X1 between NAME1 and NAME2 e.g. Only A is the overlap between Z and J, so it is 1.

 

Second, repeat first step, and divided it by distinct value of X1 for NAME1

OVERLAPX1_1 is the OVERLAPX1 divided by total distinct value of X1 for NAME1.

For example. There is 1 overlap (A) between Z and J, but Z has 2 unique values in X1 (A, D). So the ratio is 1/2 = 0.5

 

Third, repeat first step, and divided it by distinct value of X1 for NAME2

OVERLAPX1_2 is the OVERLAPX1 divided by total distinct value of X1 for NAME2.

For example. There is 1 overlap (A) between Z and J, but J has 2 unique value in X1 (A, B). So the ratio is 1/2 = 0.5

 

Fourth, repeat first step, and divided it by total distinct value of X1 for both NAME1 and NAME2

OVERLAPX1_3 is the OVERLAPX1 divided by total distinct value of X1 for NAME1 & NAME2.

For example, There is 1 overlap (A) between Z and J, but Z and J totally has 3 unique value in X1 (A, B, D). So the ratio is 1/3

 

Till now, I can achieve it with you guys help last time.

 

But, I would like to repeat the above steps by using X2, instead of X1.

 

I tried left join (select unique NAME1, X2). But it is confusing. The trouble is all the source data are in the same table and NAME1 & NAME2 have to come in pair. It is kinda a headache.

 

I also tried such codes

proc sql;
select sum(a.X1=b.X1) as OVERLAPX1,
		sum(a.X1=b.X1)/COUNT(DISTINCT a.X1) as OVERLAPX1_NAME1,
		sum(a.X1=b.X1)/COUNT(DISTINCT b.X1) as OVERLAPX1_NAME2,
		sum(a.X1=b.X1)/SUM(COUNT(DISTINCT a.X1)+ COUNT(DISTINCT b.X1) - sum(a.X1=b.X1)) as OVERLAPX1_NAME1_NAME2
		,sum(c.X2=d.X2) as OVERLAPX2,
		sum(c.X2=d.X2)/COUNT(DISTINCT c.X2) as OVERLAPX2_NAME1,
		sum(c.X2=d.X2)/COUNT(DISTINCT d.X2) as OVERLAPX2_NAME2,
		sum(c.X2=d.X2)/SUM(COUNT(DISTINCT c.X2)+ COUNT(DISTINCT d.X2) - sum(c.X2=d.X2)) as OVERLAPX2_NAME1_NAME2
from HAVE1 n
left join
(select unique NAME,X1 from HAVE2) a on n.NAME1 = a.NAME
left join
(select unique NAME,X1 from HAVE2) b on n.NAME2 = b.NAME
left join
(select unique NAME,X2 from HAVE2) c on n.NAME1 = c.NAME
left join
(select unique NAME,X2 from HAVE2) d on n.NAME2 = d.NAME
group by n.NAME1, n.NAME2
order by n.NAME1, n.NAME2;
quit;

But this is giving me wrong value.

 

Would you please check where is wrong?

 

How to achieve this using SQL. It is always my favorite.

 

 

Thank you all.

 

Bless you.

7 REPLIES 7
Reeza
Super User

Is there a limit to the number of overlaps you can have? You'll only ever need that list of variables?

 

If this needs to scale the approaches would be different.

yanshuai
Quartz | Level 8

I only care the number of overlap between NAME1 and NAME2 in HAVE1 list

For example. In terms of variable X1,

Z and J have one overlap in X1, which is A.

yanshuai
Quartz | Level 8

Hello.

Is this too complicated?

Reeza
Super User
I don't think the logic is clear of how to go from your have to your want table.
yanshuai
Quartz | Level 8

Sorry to confuse you.

Start from the simple one.

All I want is the overlap between NAME1 and NAME2 in HAVE1.

 

For example.

In HAVE1,  in first row, I have Z and J, and I would like to count their overlap in variable X1.

 

How to achieve that?

 

I go to HAVE2, and see what Z has in X1 : A and D;

Then I see what J has in X1: A and B.

Z and J has one overlap in X1: which is "A".

So the count of overlap is 1.

 

The logic is to count the number of value in common between Z and J in variable X1.

 

Does this make myself clear?

 

Thank you very much.

Reeza
Super User
I dont have time to code something, but not sure why you couldn't just do a join on table1 to table2 (name=name1) and table2 (name=name2) and then do a freq report on the results.
yanshuai
Quartz | Level 8

Yes.

I actually do.

Using SQL

proc sql;
select sum(a.X1=b.X1) as OVERLAPX1,
		sum(a.X1=b.X1)/COUNT(DISTINCT a.X1) as OVERLAPX1_NAME1,
		sum(a.X1=b.X1)/COUNT(DISTINCT b.X1) as OVERLAPX1_NAME2,
		sum(a.X1=b.X1)/SUM(COUNT(DISTINCT a.X1)+ COUNT(DISTINCT b.X1) - sum(a.X1=b.X1)) as OVERLAPX1_NAME1_NAME2
		,sum(c.X2=d.X2) as OVERLAPX2,
		sum(c.X2=d.X2)/COUNT(DISTINCT c.X2) as OVERLAPX2_NAME1,
		sum(c.X2=d.X2)/COUNT(DISTINCT d.X2) as OVERLAPX2_NAME2,
		sum(c.X2=d.X2)/SUM(COUNT(DISTINCT c.X2)+ COUNT(DISTINCT d.X2) - sum(c.X2=d.X2)) as OVERLAPX2_NAME1_NAME2
from HAVE1 n
left join
(select unique NAME,X1 from HAVE2) a on n.NAME1 = a.NAME
left join
(select unique NAME,X1 from HAVE2) b on n.NAME2 = b.NAME
left join
(select unique NAME,X2 from HAVE2) c on n.NAME1 = c.NAME
left join
(select unique NAME,X2 from HAVE2) d on n.NAME2 = d.NAME
group by n.NAME1, n.NAME2
order by n.NAME1, n.NAME2;
quit;

But it gives me wrong value.

 

Stuck here.

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 1111 views
  • 1 like
  • 2 in conversation