<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Joining 2 tables and removing duplicates in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/Joining-2-tables-and-removing-duplicates/m-p/426292#M281289</link>
    <description>&lt;P&gt;I suspect you need another criteria on your join.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;It looks like at present you have multiples by your join key variables,&amp;nbsp;&lt;/P&gt;
&lt;P&gt;GVKEY and YEAR, in each data set. If this is correct, then for every N1 records in Table1 and N2 records in Table2 for a given GVKEY and YEAR you'll get N1*N2 records.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Wed, 10 Jan 2018 01:42:26 GMT</pubDate>
    <dc:creator>Reeza</dc:creator>
    <dc:date>2018-01-10T01:42:26Z</dc:date>
    <item>
      <title>Joining 2 tables and removing duplicates</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Joining-2-tables-and-removing-duplicates/m-p/426286#M281287</link>
      <description />
      <pubDate>Thu, 04 Jun 2020 00:54:19 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Joining-2-tables-and-removing-duplicates/m-p/426286#M281287</guid>
      <dc:creator>trungcva112</dc:creator>
      <dc:date>2020-06-04T00:54:19Z</dc:date>
    </item>
    <item>
      <title>Re: Joining 2 tables and removing duplicates</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Joining-2-tables-and-removing-duplicates/m-p/426288#M281288</link>
      <description>&lt;P&gt;you can do thiscouple of ways. this is untested code. i would prefer the second way&lt;/P&gt;
&lt;P&gt;proc sql;&lt;/P&gt;
&lt;P&gt;select distinct 1.&lt;SPAN&gt;GVKEY, lag_year, meansale&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;from table1 left join table2&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;on&amp;nbsp;&amp;nbsp;1.gvkey=2.gvkey &lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;and 1.lag_year=2.year;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;or&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;proc sql&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;select a.*,&amp;nbsp; meansale from&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;(select distinct * from table2)a&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;left&amp;nbsp;&amp;nbsp;join&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;(select distinct&amp;nbsp; GVKEY, year, meansale from table 2)b&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;on&amp;nbsp; a.gvkey=b.gvkey&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;and a.lag_year=b.year;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 10 Jan 2018 01:24:56 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Joining-2-tables-and-removing-duplicates/m-p/426288#M281288</guid>
      <dc:creator>kiranv_</dc:creator>
      <dc:date>2018-01-10T01:24:56Z</dc:date>
    </item>
    <item>
      <title>Re: Joining 2 tables and removing duplicates</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Joining-2-tables-and-removing-duplicates/m-p/426292#M281289</link>
      <description>&lt;P&gt;I suspect you need another criteria on your join.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;It looks like at present you have multiples by your join key variables,&amp;nbsp;&lt;/P&gt;
&lt;P&gt;GVKEY and YEAR, in each data set. If this is correct, then for every N1 records in Table1 and N2 records in Table2 for a given GVKEY and YEAR you'll get N1*N2 records.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 10 Jan 2018 01:42:26 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Joining-2-tables-and-removing-duplicates/m-p/426292#M281289</guid>
      <dc:creator>Reeza</dc:creator>
      <dc:date>2018-01-10T01:42:26Z</dc:date>
    </item>
    <item>
      <title>Re: Joining 2 tables and removing duplicates</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Joining-2-tables-and-removing-duplicates/m-p/426312#M281290</link>
      <description>&lt;BLOCKQUOTE&gt;&lt;HR /&gt;&lt;P&gt;Is there any way to not include the duplicates with PROC SQL&lt;STRONG&gt; (or any other method?)&lt;/STRONG&gt;&lt;/P&gt;&lt;HR /&gt;&lt;/BLOCKQUOTE&gt;&lt;P&gt;Why not a simple merge after sorting. If cartesian is not considered an overhead before left join on equality operator to filter your needs, I am sure sort should be acceptable. Else, load your have1 with multidata:yes in a hash object choosing keys gvkey and year. look up from have2 in the set statement with hash.find() method else call missing(meansale). Good night!&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;data have1;&lt;BR /&gt;input GVKEY $ Year Meansale ;&lt;BR /&gt;datalines;&lt;BR /&gt;1001 1983 10&lt;BR /&gt;1001 1983 10&lt;BR /&gt;1001 1983 10&lt;BR /&gt;1001 1983 10&lt;BR /&gt;1001 1984 15&lt;BR /&gt;1001 1984 15&lt;BR /&gt;1001 1984 15&lt;BR /&gt;1001 1984 15&lt;BR /&gt;1001 1984 15&lt;BR /&gt;;&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;data have2;&lt;BR /&gt;input GVKEY $ Lag_Year;&lt;BR /&gt;datalines;&lt;BR /&gt;1001 1982&lt;BR /&gt;1001 1982&lt;BR /&gt;1001 1982&lt;BR /&gt;1001 1982&lt;BR /&gt;1001 1983&lt;BR /&gt;1001 1983&lt;BR /&gt;1001 1983&lt;BR /&gt;1001 1983&lt;BR /&gt;1001 1983&lt;BR /&gt;;&lt;/P&gt;&lt;P&gt;data want;&lt;BR /&gt;merge have2(in=a) have1(rename=(Year=Lag_Year));&lt;BR /&gt;by gvkey lag_year;&lt;BR /&gt;if a;&lt;BR /&gt;run;&lt;/P&gt;</description>
      <pubDate>Wed, 10 Jan 2018 05:31:22 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Joining-2-tables-and-removing-duplicates/m-p/426312#M281290</guid>
      <dc:creator>novinosrin</dc:creator>
      <dc:date>2018-01-10T05:31:22Z</dc:date>
    </item>
    <item>
      <title>Re: Joining 2 tables and removing duplicates</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Joining-2-tables-and-removing-duplicates/m-p/426315#M281291</link>
      <description>&lt;P&gt;Does table 2 contain only the two variables - GVKEY and Lag_Year ?&lt;/P&gt;
&lt;P&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/181905"&gt;@trungcva112&lt;/a&gt;&amp;nbsp;wrote "&lt;STRONG&gt;2nd dataset (with lag_year = year (in 1st dataset) -1 )&lt;/STRONG&gt;" -&amp;nbsp;&lt;/P&gt;
&lt;P&gt;does it mean that the 2nd table was derived from table1 ?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;If positive, you can create the wanted table directly from table1:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;proc sort data=tabl1 out=temp nodupkey ; by gvkey year. run;&lt;BR /&gt;data want;
 set temp;&lt;BR /&gt;  by gvkey year.&lt;BR /&gt;     retain year1; drop year1;&lt;BR /&gt;     if first.gvkey then year1=year;
     lag_year = year - 1;&lt;BR /&gt;     if lag_year = year1 then meansale = .;
run;&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Wed, 10 Jan 2018 05:48:41 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Joining-2-tables-and-removing-duplicates/m-p/426315#M281291</guid>
      <dc:creator>Shmuel</dc:creator>
      <dc:date>2018-01-10T05:48:41Z</dc:date>
    </item>
  </channel>
</rss>

