<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Eliminating all duplicates in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/Eliminating-all-duplicates/m-p/748035#M234888</link>
    <description>&lt;P&gt;I have been working on a merge of two data sets: the number of obs are&lt;/P&gt;
&lt;P&gt;Orpheus_one_for_matching = 1,454,515&amp;nbsp; originally&amp;nbsp; 1,817,358 obs&lt;/P&gt;
&lt;P&gt;Orpheus_for_matching&amp;nbsp; = 299,088 originally&amp;nbsp; 326,329 obs&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I created a concatenate common variable as shown&lt;/P&gt;
&lt;PRE&gt;Data SASONE.ORPHEUS_ONE_For_Matching;
  Set SASONE.ORPHEUS_ONE_For_Matching;
  Common_ID_4 = CATS(LastNm,FirstNm);
run;

Data SASONE1.ORPHEUS_For_Matching;
  Set SASONE1.ORPHEUS_For_Matching;
  Common_ID_4 = CATS(LastNm,FirstNm);
run;&lt;/PRE&gt;
&lt;P&gt;I merged the two datasets on the common_id_4&lt;/P&gt;
&lt;PRE&gt;Data SASONE.ORPOne_AllVarMerge_5A (drop = FirstMerge);
  Merge SASONE.ORPHEUS_ONE_For_Matching /*(In=In2)*/ 
        SASONE1.ORPHEUS_For_Matching;   *(In=In1);  
  by Common_ID_4;
                                        *If In1 then output;
                                        *In1 = In2;
 If FirstNm = 'ANONYMOUS' then delete;
 Else If FirstNm = 'HIV' then delete;
run;&lt;/PRE&gt;
&lt;P&gt;There were still duplicates so I did a proc sort and used the last.common_id_4&lt;/P&gt;
&lt;PRE&gt;proc sort data = SASONE.ORPOne_AllVarMerge_5A NODUPKEYS;
  by DOB;
run;

Data SASONE.ORPOne_AllVarMerge_5_Keep;
  Set SASONE.ORPOne_AllVarMerge_5;
  by Common_ID_4;
  If Last.Common_ID_4
  Then output;
run;&lt;/PRE&gt;
&lt;P&gt;Which resulted in 1,694,385 in the combined data set.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I suspect there are duplicates so I ran the following&lt;/P&gt;
&lt;PRE&gt;Proc sql noprint;
CREATE TABLE SASONE.ORPOne_AllVarDups_5_Check AS
  SELECT 'SASONE.ORPHEUS_ONE_For_Matching' As Dataset,
         Count(Distinct Common_ID_4) as Ndistinct,
         Count(*) as N
  From SASONE.ORPHEUS_ONE_For_Matching
  Outer Union Corresponding
  SELECT 'SASONE1.ORPHEUS_For_Matching' As Dataset,
         Count(Distinct Common_ID_4) as Ndistinct,
         Count(*) as N
  From SASONE1.ORPHEUS_For_Matching
  Outer Union Corresponding
  SELECT 'SASONE.ORPOne_AllVarMerge_5A' As Dataset,
         Count(Distinct Common_ID_4) as Ndistinct,
         Count(*) as N
  From (SELECT Common_ID_4 
        From SASONE.ORPHEUS_ONE_For_Matching
        Outer Union Corresponding
		SELECT Common_ID_4
        From SASONE1.ORPHEUS_For_Matching);
quit;&lt;/PRE&gt;
&lt;P&gt;The next step was to print the duplicate obs&lt;/P&gt;
&lt;PRE&gt;Proc sql noprint;
CREATE TABLE SASONE.DUPS_EXAMINE AS
   SELECT Common_ID_4
   FROM (SELECT Common_ID_4
         FROM SASONE.ORPHEUS_ONE_For_Matching
		 OUTER UNION CORRESPONDING
		 SELECT Common_ID_4
		 FROM SASONE1.ORPHEUS_For_Matching)
   GROUP BY Common_ID_4
   HAVING Count(Common_ID_4) &amp;gt; 1;
quit;
/* 229,206  obs  6.14.2021 */
&lt;/PRE&gt;
&lt;P&gt;The total dups still left in the dataset total 229,206 with a portion coming from each individual data set.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;My question is how can I delete the dups from the data set with 1,694,385 obs.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Nothing seems to work the last.var method; using proc sort and NODUPKEY.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;What would you propose as a solution, so I can provide a dataset without those remaining duplicates?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Thank you.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;wklierman&lt;/P&gt;</description>
    <pubDate>Tue, 15 Jun 2021 05:40:20 GMT</pubDate>
    <dc:creator>wlierman</dc:creator>
    <dc:date>2021-06-15T05:40:20Z</dc:date>
    <item>
      <title>Eliminating all duplicates</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Eliminating-all-duplicates/m-p/748035#M234888</link>
      <description>&lt;P&gt;I have been working on a merge of two data sets: the number of obs are&lt;/P&gt;
&lt;P&gt;Orpheus_one_for_matching = 1,454,515&amp;nbsp; originally&amp;nbsp; 1,817,358 obs&lt;/P&gt;
&lt;P&gt;Orpheus_for_matching&amp;nbsp; = 299,088 originally&amp;nbsp; 326,329 obs&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I created a concatenate common variable as shown&lt;/P&gt;
&lt;PRE&gt;Data SASONE.ORPHEUS_ONE_For_Matching;
  Set SASONE.ORPHEUS_ONE_For_Matching;
  Common_ID_4 = CATS(LastNm,FirstNm);
run;

Data SASONE1.ORPHEUS_For_Matching;
  Set SASONE1.ORPHEUS_For_Matching;
  Common_ID_4 = CATS(LastNm,FirstNm);
run;&lt;/PRE&gt;
&lt;P&gt;I merged the two datasets on the common_id_4&lt;/P&gt;
&lt;PRE&gt;Data SASONE.ORPOne_AllVarMerge_5A (drop = FirstMerge);
  Merge SASONE.ORPHEUS_ONE_For_Matching /*(In=In2)*/ 
        SASONE1.ORPHEUS_For_Matching;   *(In=In1);  
  by Common_ID_4;
                                        *If In1 then output;
                                        *In1 = In2;
 If FirstNm = 'ANONYMOUS' then delete;
 Else If FirstNm = 'HIV' then delete;
run;&lt;/PRE&gt;
&lt;P&gt;There were still duplicates so I did a proc sort and used the last.common_id_4&lt;/P&gt;
&lt;PRE&gt;proc sort data = SASONE.ORPOne_AllVarMerge_5A NODUPKEYS;
  by DOB;
run;

Data SASONE.ORPOne_AllVarMerge_5_Keep;
  Set SASONE.ORPOne_AllVarMerge_5;
  by Common_ID_4;
  If Last.Common_ID_4
  Then output;
run;&lt;/PRE&gt;
&lt;P&gt;Which resulted in 1,694,385 in the combined data set.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I suspect there are duplicates so I ran the following&lt;/P&gt;
&lt;PRE&gt;Proc sql noprint;
CREATE TABLE SASONE.ORPOne_AllVarDups_5_Check AS
  SELECT 'SASONE.ORPHEUS_ONE_For_Matching' As Dataset,
         Count(Distinct Common_ID_4) as Ndistinct,
         Count(*) as N
  From SASONE.ORPHEUS_ONE_For_Matching
  Outer Union Corresponding
  SELECT 'SASONE1.ORPHEUS_For_Matching' As Dataset,
         Count(Distinct Common_ID_4) as Ndistinct,
         Count(*) as N
  From SASONE1.ORPHEUS_For_Matching
  Outer Union Corresponding
  SELECT 'SASONE.ORPOne_AllVarMerge_5A' As Dataset,
         Count(Distinct Common_ID_4) as Ndistinct,
         Count(*) as N
  From (SELECT Common_ID_4 
        From SASONE.ORPHEUS_ONE_For_Matching
        Outer Union Corresponding
		SELECT Common_ID_4
        From SASONE1.ORPHEUS_For_Matching);
quit;&lt;/PRE&gt;
&lt;P&gt;The next step was to print the duplicate obs&lt;/P&gt;
&lt;PRE&gt;Proc sql noprint;
CREATE TABLE SASONE.DUPS_EXAMINE AS
   SELECT Common_ID_4
   FROM (SELECT Common_ID_4
         FROM SASONE.ORPHEUS_ONE_For_Matching
		 OUTER UNION CORRESPONDING
		 SELECT Common_ID_4
		 FROM SASONE1.ORPHEUS_For_Matching)
   GROUP BY Common_ID_4
   HAVING Count(Common_ID_4) &amp;gt; 1;
quit;
/* 229,206  obs  6.14.2021 */
&lt;/PRE&gt;
&lt;P&gt;The total dups still left in the dataset total 229,206 with a portion coming from each individual data set.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;My question is how can I delete the dups from the data set with 1,694,385 obs.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Nothing seems to work the last.var method; using proc sort and NODUPKEY.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;What would you propose as a solution, so I can provide a dataset without those remaining duplicates?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Thank you.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;wklierman&lt;/P&gt;</description>
      <pubDate>Tue, 15 Jun 2021 05:40:20 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Eliminating-all-duplicates/m-p/748035#M234888</guid>
      <dc:creator>wlierman</dc:creator>
      <dc:date>2021-06-15T05:40:20Z</dc:date>
    </item>
    <item>
      <title>Re: Eliminating all duplicates</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Eliminating-all-duplicates/m-p/748036#M234889</link>
      <description>Your sort appears to be by DOB. Try changing it to Common_ID_4.&lt;BR /&gt;&lt;BR /&gt;Also, pay close attention to your dataset names.  Make sure that the output from one step is applied with the correct name in any subsequent steps.&lt;BR /&gt;&lt;BR /&gt;Jim</description>
      <pubDate>Tue, 15 Jun 2021 05:55:50 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Eliminating-all-duplicates/m-p/748036#M234889</guid>
      <dc:creator>jimbarbour</dc:creator>
      <dc:date>2021-06-15T05:55:50Z</dc:date>
    </item>
    <item>
      <title>Re: Eliminating all duplicates</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Eliminating-all-duplicates/m-p/748097#M234907</link>
      <description>&lt;P&gt;First, get rid of COMMON_ID.&amp;nbsp; SAS knows how to sort and merge by multiple variables.&amp;nbsp; You might start with:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;proc sort data=sasone.orpheus_one_for_matching nodupkey out=sorted_one;
   by LastNm FirstNm;
   where FirstNm not in ('HIV', 'ANONYMOUS');
run;
   &lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;Then similarly for the second data set:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;proc sort data=sasone.orpheus_for_matching out=sorted_for nodupkey;
   by LastNm FirstNm;
   where FirstNm not in ('HIV', 'ANONYMOUS');
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;Now the individual data sets will have no duplicates remaining.&amp;nbsp; But you have to decide what to do if the same LastNm FirstNm combination appears in both of the data sets.&amp;nbsp; Do you want to take the values from one particular data set or the other?&amp;nbsp; Do you want a blend of the values from the two data sets?&amp;nbsp; The programming won't be that difficult, but you have to make the decision first.&lt;/P&gt;</description>
      <pubDate>Tue, 15 Jun 2021 11:48:31 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Eliminating-all-duplicates/m-p/748097#M234907</guid>
      <dc:creator>Astounding</dc:creator>
      <dc:date>2021-06-15T11:48:31Z</dc:date>
    </item>
    <item>
      <title>Re: Eliminating all duplicates</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Eliminating-all-duplicates/m-p/748117#M234918</link>
      <description>&lt;P&gt;Thank you for the help.&amp;nbsp; I have a dataset that merged without dups so I'm sending that forward.&amp;nbsp; I am going to work on this smaller but dup containing dataset.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Thanks again.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;wklierman&lt;/P&gt;</description>
      <pubDate>Tue, 15 Jun 2021 13:42:28 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Eliminating-all-duplicates/m-p/748117#M234918</guid>
      <dc:creator>wlierman</dc:creator>
      <dc:date>2021-06-15T13:42:28Z</dc:date>
    </item>
  </channel>
</rss>

