<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: How to COUNT the overlap and compute its ratio? in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/How-to-COUNT-the-overlap-and-compute-its-ratio/m-p/504918#M135186</link>
    <description>&lt;P&gt;Hello.&lt;/P&gt;&lt;P&gt;Is this too complicated?&lt;/P&gt;</description>
    <pubDate>Tue, 16 Oct 2018 22:04:57 GMT</pubDate>
    <dc:creator>yanshuai</dc:creator>
    <dc:date>2018-10-16T22:04:57Z</dc:date>
    <item>
      <title>How to COUNT the overlap and compute its ratio?</title>
      <link>https://communities.sas.com/t5/SAS-Programming/How-to-COUNT-the-overlap-and-compute-its-ratio/m-p/504559#M135042</link>
      <description>&lt;P&gt;Hello,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I have asked similar question before but I found I only achieved partial goal.&lt;/P&gt;&lt;P&gt;So I have data like this&lt;/P&gt;&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;DATA HAVE1;
    INPUT (NAME1 NAME2) (:$8.);
    CARDS;
Z J
Z K
X J
X K
;


DATA HAVE2;
    INPUT (NAME X1 X2) (:$8.);
    CARDS;
Z A 1
Z D 2
X C 3
X D 4
J A 1
J B 1
K E 3
K D 4
;&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;What I want is like this&lt;/P&gt;&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;DATA WANT;
    INPUT (NAME1 NAME2 OVERLAPX1 OVERLAPX1_1 OVERLAPX1_2 OVERLAPX1_3 OVERLAPX2 OVERLAPX2_1 OVERLAPX2_2 OVERLAPX2_3) (:$8.);
    CARDS;
Z J 1 0.5 0.5 1/3 1 0.5 1 0.5
Z K 1 0.5 0.5 1/3 0 0 0 0
X J 0 0 0 0 0 0 0 0
X K 1 0.5 0.5 1/3 2 1 1 1
;&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;First step, compute the number of overlap between NAME1 and NAME2.&lt;/P&gt;&lt;P&gt;For example (using first row in WANT as example)&lt;/P&gt;&lt;P&gt;So OVERLAPX1 is the number of overlap in X1 between NAME1 and NAME2 e.g. Only A is the overlap between Z and J, so it is 1.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Second, repeat first step, and divided it by distinct value of X1 for NAME1&lt;/P&gt;&lt;P&gt;OVERLAPX1_1 is the OVERLAPX1 divided by total distinct value of X1 for NAME1.&lt;/P&gt;&lt;P&gt;For example. There is 1 overlap (A) between Z and J, but Z has 2 unique values in X1 (A, D). So the ratio is 1/2 = 0.5&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Third, repeat first step, and divided it by distinct value of X1 for NAME2&lt;/P&gt;&lt;P&gt;OVERLAPX1_2 is the OVERLAPX1 divided by total distinct value of X1 for NAME2.&lt;/P&gt;&lt;P&gt;For example. There is 1 overlap (A) between Z and J, but J has 2 unique value in X1 (A, B). So the ratio is 1/2 = 0.5&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Fourth, repeat first step, and divided it by total distinct value of X1 for both NAME1 and NAME2&lt;/P&gt;&lt;P&gt;OVERLAPX1_3 is the OVERLAPX1 divided by total distinct value of X1 for NAME1 &amp;amp; NAME2.&lt;/P&gt;&lt;P&gt;For example, There is 1 overlap (A) between Z and J, but Z and J totally has 3 unique value in X1 (A, B, D). So the ratio is 1/3&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Till now, I can achieve it with you guys help last time.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;But, I would like to repeat the above steps by using X2, instead of X1.&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I tried left join (select unique NAME1, X2). But it is confusing. The trouble is all the source data are in the same table and NAME1 &amp;amp; NAME2 have to come in pair. It is kinda a headache.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I also tried such codes&lt;/P&gt;&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;proc sql;
select sum(a.X1=b.X1) as OVERLAPX1,
		sum(a.X1=b.X1)/COUNT(DISTINCT a.X1) as OVERLAPX1_NAME1,
		sum(a.X1=b.X1)/COUNT(DISTINCT b.X1) as OVERLAPX1_NAME2,
		sum(a.X1=b.X1)/SUM(COUNT(DISTINCT a.X1)+ COUNT(DISTINCT b.X1) - sum(a.X1=b.X1)) as OVERLAPX1_NAME1_NAME2
		,sum(c.X2=d.X2) as OVERLAPX2,
		sum(c.X2=d.X2)/COUNT(DISTINCT c.X2) as OVERLAPX2_NAME1,
		sum(c.X2=d.X2)/COUNT(DISTINCT d.X2) as OVERLAPX2_NAME2,
		sum(c.X2=d.X2)/SUM(COUNT(DISTINCT c.X2)+ COUNT(DISTINCT d.X2) - sum(c.X2=d.X2)) as OVERLAPX2_NAME1_NAME2
from HAVE1 n
left join
(select unique NAME,X1 from HAVE2) a on n.NAME1 = a.NAME
left join
(select unique NAME,X1 from HAVE2) b on n.NAME2 = b.NAME
left join
(select unique NAME,X2 from HAVE2) c on n.NAME1 = c.NAME
left join
(select unique NAME,X2 from HAVE2) d on n.NAME2 = d.NAME
group by n.NAME1, n.NAME2
order by n.NAME1, n.NAME2;
quit;&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;But this is giving me wrong value.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Would you please check where is wrong?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;How to achieve this using SQL. It is always my favorite.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thank you all.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Bless you.&lt;/P&gt;</description>
      <pubDate>Wed, 17 Oct 2018 20:30:36 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/How-to-COUNT-the-overlap-and-compute-its-ratio/m-p/504559#M135042</guid>
      <dc:creator>yanshuai</dc:creator>
      <dc:date>2018-10-17T20:30:36Z</dc:date>
    </item>
    <item>
      <title>Re: How to COUNT the overlap and compute its ratio?</title>
      <link>https://communities.sas.com/t5/SAS-Programming/How-to-COUNT-the-overlap-and-compute-its-ratio/m-p/504560#M135043</link>
      <description>&lt;P&gt;Is there a limit to the number of overlaps you can have? You'll only ever need that list of variables?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;If this needs to scale the approaches would be different.&lt;/P&gt;</description>
      <pubDate>Tue, 16 Oct 2018 01:40:47 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/How-to-COUNT-the-overlap-and-compute-its-ratio/m-p/504560#M135043</guid>
      <dc:creator>Reeza</dc:creator>
      <dc:date>2018-10-16T01:40:47Z</dc:date>
    </item>
    <item>
      <title>Re: How to COUNT the overlap and compute its ratio?</title>
      <link>https://communities.sas.com/t5/SAS-Programming/How-to-COUNT-the-overlap-and-compute-its-ratio/m-p/504796#M135142</link>
      <description>&lt;P&gt;I only care the number of overlap between NAME1 and NAME2 in HAVE1 list&lt;/P&gt;&lt;P&gt;For example. In terms of variable X1,&lt;/P&gt;&lt;P&gt;Z and J have one&amp;nbsp;overlap in X1, which is A.&lt;/P&gt;</description>
      <pubDate>Tue, 16 Oct 2018 17:41:15 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/How-to-COUNT-the-overlap-and-compute-its-ratio/m-p/504796#M135142</guid>
      <dc:creator>yanshuai</dc:creator>
      <dc:date>2018-10-16T17:41:15Z</dc:date>
    </item>
    <item>
      <title>Re: How to COUNT the overlap and compute its ratio?</title>
      <link>https://communities.sas.com/t5/SAS-Programming/How-to-COUNT-the-overlap-and-compute-its-ratio/m-p/504918#M135186</link>
      <description>&lt;P&gt;Hello.&lt;/P&gt;&lt;P&gt;Is this too complicated?&lt;/P&gt;</description>
      <pubDate>Tue, 16 Oct 2018 22:04:57 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/How-to-COUNT-the-overlap-and-compute-its-ratio/m-p/504918#M135186</guid>
      <dc:creator>yanshuai</dc:creator>
      <dc:date>2018-10-16T22:04:57Z</dc:date>
    </item>
    <item>
      <title>Re: How to COUNT the overlap and compute its ratio?</title>
      <link>https://communities.sas.com/t5/SAS-Programming/How-to-COUNT-the-overlap-and-compute-its-ratio/m-p/504923#M135189</link>
      <description>I don't think the logic is clear of how to go from your have to your want table.</description>
      <pubDate>Tue, 16 Oct 2018 22:10:22 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/How-to-COUNT-the-overlap-and-compute-its-ratio/m-p/504923#M135189</guid>
      <dc:creator>Reeza</dc:creator>
      <dc:date>2018-10-16T22:10:22Z</dc:date>
    </item>
    <item>
      <title>Re: How to COUNT the overlap and compute its ratio?</title>
      <link>https://communities.sas.com/t5/SAS-Programming/How-to-COUNT-the-overlap-and-compute-its-ratio/m-p/504925#M135190</link>
      <description>&lt;P&gt;Sorry to confuse you.&lt;/P&gt;&lt;P&gt;Start from the simple one.&lt;/P&gt;&lt;P&gt;All I want is the overlap between NAME1 and NAME2 in HAVE1.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;For example.&lt;/P&gt;&lt;P&gt;In HAVE1,&amp;nbsp; in first row, I have &lt;STRONG&gt;Z&lt;/STRONG&gt; and &lt;STRONG&gt;J&lt;/STRONG&gt;, and I would like to count their overlap in variable &lt;STRONG&gt;X1&lt;/STRONG&gt;.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;How to achieve that?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I go to HAVE2, and see what Z has in X1 : A and D;&lt;/P&gt;&lt;P&gt;Then I see what J has in X1: A and B.&lt;/P&gt;&lt;P&gt;Z and J has one overlap in X1: which is "A".&lt;/P&gt;&lt;P&gt;So the count of overlap is 1.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;The logic is to count the number of value in common between Z and J in variable X1.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Does this make myself clear?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thank you very much.&lt;/P&gt;</description>
      <pubDate>Tue, 16 Oct 2018 22:19:43 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/How-to-COUNT-the-overlap-and-compute-its-ratio/m-p/504925#M135190</guid>
      <dc:creator>yanshuai</dc:creator>
      <dc:date>2018-10-16T22:19:43Z</dc:date>
    </item>
    <item>
      <title>Re: How to COUNT the overlap and compute its ratio?</title>
      <link>https://communities.sas.com/t5/SAS-Programming/How-to-COUNT-the-overlap-and-compute-its-ratio/m-p/504926#M135191</link>
      <description>I dont have time to code something, but not sure why you couldn't just do a join on table1 to table2 (name=name1) and table2 (name=name2) and then do a freq report on the results.</description>
      <pubDate>Tue, 16 Oct 2018 22:22:15 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/How-to-COUNT-the-overlap-and-compute-its-ratio/m-p/504926#M135191</guid>
      <dc:creator>Reeza</dc:creator>
      <dc:date>2018-10-16T22:22:15Z</dc:date>
    </item>
    <item>
      <title>Re: How to COUNT the overlap and compute its ratio?</title>
      <link>https://communities.sas.com/t5/SAS-Programming/How-to-COUNT-the-overlap-and-compute-its-ratio/m-p/504931#M135196</link>
      <description>&lt;P&gt;Yes.&lt;/P&gt;&lt;P&gt;I actually do.&lt;/P&gt;&lt;P&gt;Using SQL&lt;/P&gt;&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;proc sql;
select sum(a.X1=b.X1) as OVERLAPX1,
		sum(a.X1=b.X1)/COUNT(DISTINCT a.X1) as OVERLAPX1_NAME1,
		sum(a.X1=b.X1)/COUNT(DISTINCT b.X1) as OVERLAPX1_NAME2,
		sum(a.X1=b.X1)/SUM(COUNT(DISTINCT a.X1)+ COUNT(DISTINCT b.X1) - sum(a.X1=b.X1)) as OVERLAPX1_NAME1_NAME2
		,sum(c.X2=d.X2) as OVERLAPX2,
		sum(c.X2=d.X2)/COUNT(DISTINCT c.X2) as OVERLAPX2_NAME1,
		sum(c.X2=d.X2)/COUNT(DISTINCT d.X2) as OVERLAPX2_NAME2,
		sum(c.X2=d.X2)/SUM(COUNT(DISTINCT c.X2)+ COUNT(DISTINCT d.X2) - sum(c.X2=d.X2)) as OVERLAPX2_NAME1_NAME2
from HAVE1 n
left join
(select unique NAME,X1 from HAVE2) a on n.NAME1 = a.NAME
left join
(select unique NAME,X1 from HAVE2) b on n.NAME2 = b.NAME
left join
(select unique NAME,X2 from HAVE2) c on n.NAME1 = c.NAME
left join
(select unique NAME,X2 from HAVE2) d on n.NAME2 = d.NAME
group by n.NAME1, n.NAME2
order by n.NAME1, n.NAME2;
quit;&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;But it gives me wrong value.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Stuck here.&lt;/P&gt;</description>
      <pubDate>Tue, 16 Oct 2018 22:37:12 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/How-to-COUNT-the-overlap-and-compute-its-ratio/m-p/504931#M135196</guid>
      <dc:creator>yanshuai</dc:creator>
      <dc:date>2018-10-16T22:37:12Z</dc:date>
    </item>
  </channel>
</rss>

