<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: How do I sort during de-duplication without using that as a criteria for deduplicating? in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/How-do-I-sort-during-de-duplication-without-using-that-as-a/m-p/698009#M213420</link>
    <description>&lt;P&gt;This create a new phone number variable to be the "last" occurring non-0 value.&lt;/P&gt;
&lt;PRE&gt;proc sort data=have;
   by id date;
run;
/* next step assumes sorted by id date*/

data want;
   set have;
   by id date;
   retain finalphone;
   if first.id then call missing(finalphone);
   if phone ne 0 then finalphone=phone;
   if last.id;
run;&lt;/PRE&gt;
&lt;P&gt;Since I seldom do arithmetic with phone numbers I would typically say to have non-valid phone numbers as character missing, i.e. blank and test differently than the "phone ne 0" used above.&lt;/P&gt;
&lt;P&gt;Note that this does the de-duplication after a simple sort. So see the basic results matches your need.&lt;/P&gt;
&lt;P&gt;You need two variables, so you could at the last do something like:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;if last.id then do:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&amp;nbsp; phone=finalphone;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&amp;nbsp; output;&lt;/P&gt;
&lt;P&gt;end;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;and drop the finalphone variable from the output set.&lt;/P&gt;</description>
    <pubDate>Tue, 10 Nov 2020 21:21:33 GMT</pubDate>
    <dc:creator>ballardw</dc:creator>
    <dc:date>2020-11-10T21:21:33Z</dc:date>
    <item>
      <title>How do I sort during de-duplication without using that as a criteria for deduplicating?</title>
      <link>https://communities.sas.com/t5/SAS-Programming/How-do-I-sort-during-de-duplication-without-using-that-as-a/m-p/697996#M213410</link>
      <description>&lt;P&gt;I have a dataset that I want to de-duplicate based on an ID variable and a date. Most IDs have &amp;gt;1 row, but not all rows are as complete. My problem is the phone-number column. During the dedup I want to keep the rows where the phone number is present, but I know some will be missing a phone number, so if that's the case I still want to keep that unique record.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Is there a way to easily do this? I have tried using proc sort with nodupkey, but when I just use ID and date as my by variables, it typically sorts in a way that leaves only the empty phone number records. If I leave phone in the by statement, it keeps too much.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;data have;
input ID date :mmddyy10. phone ;
format date mmddyy10.;
cards;
1 11/01/2020 1112223333
1 11/01/2020 00
2 11/02/2020 2223334444
2 11/02/2020 00
3 11/02/2020 00
4 11/01/2020 00
4 11/01/2020 3334445555
5 11/03/2020 4445556666
5 11/03/2020 5556667777
4 11/03/2020 00
4 11/03/2020 3334445555
;
run;


data want;
input ID date :mmddyy10. phone ;
format date mmddyy10.;
cards;
1 11/01/2020 1112223333
2 11/02/2020 2223334444
3 11/02/2020 00
4 11/01/2020 3334445555
5 11/03/2020 4445556666
4 11/03/2020 3334445555
;
run;

proc sort data=have nodupkey;
	by id date;
run;&lt;/PRE&gt;</description>
      <pubDate>Tue, 10 Nov 2020 20:46:15 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/How-do-I-sort-during-de-duplication-without-using-that-as-a/m-p/697996#M213410</guid>
      <dc:creator>csanfor2</dc:creator>
      <dc:date>2020-11-10T20:46:15Z</dc:date>
    </item>
    <item>
      <title>Re: How do I sort during de-duplication without using that as a criteria for deduplicating?</title>
      <link>https://communities.sas.com/t5/SAS-Programming/How-do-I-sort-during-de-duplication-without-using-that-as-a/m-p/698009#M213420</link>
      <description>&lt;P&gt;This create a new phone number variable to be the "last" occurring non-0 value.&lt;/P&gt;
&lt;PRE&gt;proc sort data=have;
   by id date;
run;
/* next step assumes sorted by id date*/

data want;
   set have;
   by id date;
   retain finalphone;
   if first.id then call missing(finalphone);
   if phone ne 0 then finalphone=phone;
   if last.id;
run;&lt;/PRE&gt;
&lt;P&gt;Since I seldom do arithmetic with phone numbers I would typically say to have non-valid phone numbers as character missing, i.e. blank and test differently than the "phone ne 0" used above.&lt;/P&gt;
&lt;P&gt;Note that this does the de-duplication after a simple sort. So see the basic results matches your need.&lt;/P&gt;
&lt;P&gt;You need two variables, so you could at the last do something like:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;if last.id then do:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&amp;nbsp; phone=finalphone;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&amp;nbsp; output;&lt;/P&gt;
&lt;P&gt;end;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;and drop the finalphone variable from the output set.&lt;/P&gt;</description>
      <pubDate>Tue, 10 Nov 2020 21:21:33 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/How-do-I-sort-during-de-duplication-without-using-that-as-a/m-p/698009#M213420</guid>
      <dc:creator>ballardw</dc:creator>
      <dc:date>2020-11-10T21:21:33Z</dc:date>
    </item>
    <item>
      <title>Re: How do I sort during de-duplication without using that as a criteria for deduplicating?</title>
      <link>https://communities.sas.com/t5/SAS-Programming/How-do-I-sort-during-de-duplication-without-using-that-as-a/m-p/698017#M213423</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/78549"&gt;@csanfor2&lt;/a&gt;&amp;nbsp; Your question is a classic example that presents a neat use-case for NOTSORTED and DOW loop utilization as a combo. Thank you for the question-&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;

data have;
input ID date :mmddyy10. phone :$10.;
format date mmddyy10.;
cards;
1 11/01/2020 1112223333
1 11/01/2020 00
2 11/02/2020 2223334444
2 11/02/2020 00
3 11/02/2020 00
4 11/01/2020 00
4 11/01/2020 3334445555
5 11/03/2020 4445556666
5 11/03/2020 5556667777
4 11/03/2020 00
4 11/03/2020 3334445555
;
run;

data want;
 do until(last.id);
  set have;
  by id notsorted;
  if _n_ and phone not in (' ','00') then do;
   _n_=0;
   output;
  end;
 end;
 if _n_ then output;
run;

proc print noobs;run;
&lt;/CODE&gt;&lt;/PRE&gt;
&lt;DIV class="branch"&gt;
&lt;DIV&gt;
&lt;DIV align="center"&gt;
&lt;TABLE class="table" summary="Procedure Print: Data Set WORK.WANT" frame="box" rules="all" cellspacing="0" cellpadding="5"&gt;
&lt;THEAD&gt;
&lt;TR&gt;
&lt;TH class="r header" scope="col"&gt;ID&lt;/TH&gt;
&lt;TH class="r header" scope="col"&gt;date&lt;/TH&gt;
&lt;TH class="l header" scope="col"&gt;phone&lt;/TH&gt;
&lt;/TR&gt;
&lt;/THEAD&gt;
&lt;TBODY&gt;
&lt;TR&gt;
&lt;TD class="r data"&gt;1&lt;/TD&gt;
&lt;TD class="r data"&gt;11/01/2020&lt;/TD&gt;
&lt;TD class="l data"&gt;1112223333&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD class="r data"&gt;2&lt;/TD&gt;
&lt;TD class="r data"&gt;11/02/2020&lt;/TD&gt;
&lt;TD class="l data"&gt;2223334444&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD class="r data"&gt;3&lt;/TD&gt;
&lt;TD class="r data"&gt;11/02/2020&lt;/TD&gt;
&lt;TD class="l data"&gt;00&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD class="r data"&gt;4&lt;/TD&gt;
&lt;TD class="r data"&gt;11/01/2020&lt;/TD&gt;
&lt;TD class="l data"&gt;3334445555&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD class="r data"&gt;5&lt;/TD&gt;
&lt;TD class="r data"&gt;11/03/2020&lt;/TD&gt;
&lt;TD class="l data"&gt;4445556666&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD class="r data"&gt;4&lt;/TD&gt;
&lt;TD class="r data"&gt;11/03/2020&lt;/TD&gt;
&lt;TD class="l data"&gt;3334445555&lt;/TD&gt;
&lt;/TR&gt;
&lt;/TBODY&gt;
&lt;/TABLE&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;</description>
      <pubDate>Tue, 10 Nov 2020 21:36:59 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/How-do-I-sort-during-de-duplication-without-using-that-as-a/m-p/698017#M213423</guid>
      <dc:creator>novinosrin</dc:creator>
      <dc:date>2020-11-10T21:36:59Z</dc:date>
    </item>
    <item>
      <title>Re: How do I sort during de-duplication without using that as a criteria for deduplicating?</title>
      <link>https://communities.sas.com/t5/SAS-Programming/How-do-I-sort-during-de-duplication-without-using-that-as-a/m-p/698064#M213443</link>
      <description>&lt;P&gt;This is one of those times I probably would not go to the DOW loop.&amp;nbsp; My less-busy suggestion is:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data have;
input ID date :mmddyy10. phone :$10.;
format date mmddyy10.;
cards;
1 11/01/2020 1112223333
1 11/01/2020 00
2 11/02/2020 2223334444
2 11/02/2020 00
3 11/02/2020 00
4 11/01/2020 00
4 11/01/2020 3334445555
5 11/03/2020 4445556666
5 11/03/2020 5556667777
4 11/03/2020 00
4 11/03/2020 3334445555
run;

data want;
  set have;
  by id notsorted;
  if first.id then ptr=0;
  if phone^='00' and ptr=0 then ptr=_n_;
  if last.id;
  if ptr ^=0 then set have point=ptr;
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;The set with a POINT= option works to retrieve the first record with a qualifying phone number for each id group.&amp;nbsp; If no qualifying phone number is found then the last record for the id group is output.&amp;nbsp; This produces what the OP requested, but if an ID only has multiple '00' phones, we have no guidance on which record to keep (i.e. which date).&amp;nbsp;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 11 Nov 2020 03:37:39 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/How-do-I-sort-during-de-duplication-without-using-that-as-a/m-p/698064#M213443</guid>
      <dc:creator>mkeintz</dc:creator>
      <dc:date>2020-11-11T03:37:39Z</dc:date>
    </item>
  </channel>
</rss>

